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Sugar and Lipid Metabolism Regulators in Plants IV 
CROSS REFERENCE TO RELATED APPLICATIONS 

[001] The present invention claims the priority benefit of U.S. Provisional Patent 

Application Serial No. 60/400,803 filed August 2, 2002, the entire contents of which are 
hereby incorporated by reference. 

BACKGROUND OF THE INVENTION 
Field of the Invention 

[002] This invention relates generally to nucleic acid sequences encoding proteins 

that are related to the presence of seed storage compounds in plants. More specifically, the 
present invention relates to nucleic acid sequences encoding sugar and lipid metabolism 
regulator proteins and the use of these sequences in transgenic plants. The invention further 
relates to methods of applying these novel plant polypeptides to the identification and 
stimulation of plant growth and/or to the increase of yield of seed storage compounds. 

Background Art 

[003] The study and genetic manipulation of plants has a long history that began 

even before the famed studies of Gregor Mendel. In perfecting this science, scientists have 
accomplished modification of particular traits in plants ranging from potato tubers having 
increased starch content to oilseed plants such as canola and sunflower having increased or 
altered fatty acid content. With the increased consumption and use of plant oils, the 
modification of seed oil content and seed oil levels has become increasingly widespread (e.g. 
Topfer et al., 1995, Science 268:681-686). Manipulation of biosynthetic pathways in 
transgenic plants provides a number of opportunities for molecular biologists and plant 
biochemists to affect plant metabolism giving rise to the production of specific higher-value 
products. The seed oil production or composition has been altered in numerous traditional 
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oilseed plants such as soybean (U.S. Patent No. 5,955,650), canola (U.S. Patent No. 
5,955,650), sunflower (U.S. Patent No. 6,084,164), rapeseed (Topfer et aL, 1995, Science 
268:681-686), and non-traditional oil seed plants such as tobacco (Cahoon et al., 1992, Proc. 
Natl. Acad. Sci. USA 89:11184-11188). 

[004] Plant seed oils comprise both neutral and polar lipids (See Table 1). The 

neutral lipids contain primarily triacylglycerol, which is the main storage lipid that 
accumulates in oil bodies in seeds. The polar lipids are mainly found in the various 
membranes of the seed cells, e.g. the endoplasmic reticulum, microsomal membranes, and the 
cell membrane. The neutral and polar lipids contain several common fatty acids (See Table 2) 
and a range of less common fatty acids. The fatty acid composition of membrane lipids is 
highly regulated and only a select number of fatty acids are found in membrane lipids. On the 
other hand, a large number of unusual fatty acids can be incorporated into the neutral storage 
lipids in seeds of many plant species (Van de Loo F.J. et al., 1993, Unusual Fatty Acids in 
Lipid Metabolism in Plants pp. 91-126, editor TS Moore Jr. CRC Press; Millar et aL, 2000, 
Trends Plant Sci. 5:95-101). 



Table 1 
Plant Lipid Classes 



Neutral Lipids 


Triacylglycerol (TAG) 




Diacylglycerol (DAG) 




Monoacylglycerol (MAG) 






Polar Lipids 


Monogalactosyldiacylglycerol (MGDG) 




Digalactosyldiacylglycerol (DGDG) 




Phosphatidylglycerol (PG) 




Phosphatidylcholine (PC) 




Phosphatidylethanolamine (PE) 




Phosphatidylinositol (PI) 




Phosphatidylserine (PS) 




Sulfoquinovosyldiacylglycerol 
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Table 2 



Common Plant Fatty Acids 



16:0 


Palmitic acid 


16:1 


Palmitoleic acid 


16:3 


Palmitolenic acid 


18:0 


Stearic acid 


18:1 


Oleic acid 


18:2 


Linoleic acid 


18:3 


Linolenic acid 


y-18:3 


Gamma-linolenic acid* 


20:0 


Arachidic acid 


20:1 


Eicosenoic acid 


22:6 


Docosahexanoic acid (DHA) * 


20:2 


Eicosadienoic acid 


20:4 


Arachidonic acid (AA) * 


20:5 


Eicosapentaenoic acid (EPA) * 


|22:1 


Erucic acid 



[005] In Table 2, the fatty acids denoted with an asterisk do not normally occur in 

plant seed oils, but their production in transgenic plant seed oil is of importance in plant 
biotechnology. 

[006] Lipids are synthesized from fatty acids, and their synthesis may be divided into 

two parts: the prokaryotic pathway and the eukaryotic pathway (Browse et al., 1986, 
Biochemical J. 235:25-31; Ohlrogge & Browse, 1995, Plant Cell 7:957-970). The prokaryotic 
pathway is located in plastids, the primary site of fatty acid biosynthesis. Fatty acid synthesis 
begins with the conversion of acetyl-CoA to malonyl-CoA by acetyl-CoA carboxylase 
(ACCase). Malonyl-CoA is converted to malonyl-acyl carrier protein (ACP) by the malonyl- 
CoA:ACP transacylase. The enzyme beta-keto-acyl-ACP-synthase m (KAS m) catalyzes a 
condensation reaction in which the acyl group from acetyl-CoA is transferred to malonyl- 
ACP to form 3-ketobutyryl-ACP. In a subsequent series of condensation, reduction and 
dehydration reactions the nascent fatty acid chain on the ACP cofactor is elongated by the 
step-by-step addition (condensation) of two carbon atoms donated by malonyl-ACP until a 
16-carbon or 18-carbon saturated fatty acid chain is formed. The plastidial delta-9 acyl-ACP 
desaturase introduces the first unsaturated double bond into the fatty acid. Thioesterases 
cleave the fatty acids from the ACP cofactor, and free fatty acids are exported to the 
cytoplasm where they participate as fatty acyl-CoA esters in the eukaryotic pathway. In the 
eukaryotic pathway, the fatty acids are esterified by glycerol-3-phosphate acyltransfeiase and 
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lysophosphatidic acid acyltransferase to the sn-1 and sn-2 positions of glycerol-3-phosphate, 
respectively, to yield phosphatide acid (PA). The PA is the precursor for other polar and 
neutral lipids, the latter being formed in the Kennedy pathway (Voelker, 1996, Genetic 
Engineering ed.:Setlow 18:111-113; Shanklin & Cahoon, 1998, Annu. Rev. Plant Physiol. 
Plant Mol. Biol. 49:611-641; Frentzen, 1998, Lipids 100:161-166; Millar et al., 2000, Trends 
Plant Sci. 5:95-101). 

[007] Storage lipids in seeds are synthesized from caibohydrate-derived precursors. 

Plants have a complete glycolytic pathway in the cytosol (Plaxton, 1996, Annu. Rev. Plant 
Physiol. Plant Mol. Biol. 47:185-214), and it has been shown that a complete pathway also 
exists in the plastids of rapeseeds (Kang & Rawsthorne, 1994, Plant J. 6:795-805). Sucrose is 
the primary source of carbon and energy, transported from the leaves into the developing 
seeds. During the storage phase of seeds, sucrose is converted in the cytosol to provide the 
metabolic precursors glucose-6-phosphate and pyruvate. These are transported into the 
plastids and converted into acetyl-CoA that serves as the primary precursor for the synthesis 
of fatty acids. Acetyl-CoA in the plastids is the central precursor for lipid biosynthesis. 
Acetyl-CoA can be formed in the plastids by different reactions, and the exact contribution of 
each reaction is still being debated (Ohlrogge & Browse, 1995, Plant Cell 7:957-970). It is 
accepted, however, that a large part of the acetyl-CoA is derived from glucose-6-phospate 
and pyruvate that are imported from the cytoplasm into the plastids. Sucrose is produced in 
the source organs (leaves, or anywhere that photosynthesis occurs) and is transported to the 
developing seeds that are also termed sink organs. In the developing seeds, the sucrose is the 
precursor for all the storage compounds, i.e. starch, lipids and partly the seed storage 
proteins. Therefore, it is clear that carbohydrate metabolism in which sucrose plays a central 
role is very important to the accumulation of seed storage compounds. 

[008] Although lipid and fatty acid content of seed oil can be modified by the 

traditional methods of plant breeding, the advent of recombinant DNA technology has 
allowed for easier manipulation of the seed oil content of a plant, and in some cases, has 
allowed for the alteration of seed oils in ways that could not be accomplished by breeding 
alone (See, e.g., Topfer et al. 1995, Science 268:681-686). For example, introduction of a 
A 12 -hydroxylase nucleic acid sequence into transgenic tobacco resulted in the introduction of 
a novel fatty acid, ricinoleic acid, into the tobacco seed oil (Van de Loo et al., 1995, Proc. 
Natl. Acad. Sci USA 92:6743-6747). Tobacco plants have also been engineered to produce 
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low levels of petioselinic acid by the introduction and expression of an acyl-ACP desaturase 
from coriander (Cahoon et al., 1992, Proc. Natl. Acad. Sci USA 89:1 1184-1 1 188). 
[009] The modification of seed oil content in plants has significant medical, nutritional, and 
economic ramifications. With regard to the medical ramifications, the long chain fetty acids 
(CI 8 and longer) found in many seed oils have been linked to reductions in 
hypercholesterolemia and other clinical disorders related to coronary heart disease (Brenner, 
1976, Adv. Exp. Med. Biol. 83:85-101). Therefore, consumption of a plant having increased 
levels of these types of fatty acids may reduce the risk of heart disease. Enhanced levels of 
seed oil content also increase large-scale production of seed oils and thereby reduce the cost 
of these oils. 

[010] In order to increase or alter the levels of compounds such as seed oils in plants, 

nucleic acid sequences and proteins regulating lipid and fatty acid metabolism must be 
identified. As mentioned earlier, several desaturase nucleic acids such as the A 6 -desaturase 
nucleic acid, A 12 -desaturase nucleic acid and acyl-ACP desaturase nucleic acid have been 
cloned and demonstrated to encode enzymes required for fatty acid synthesis in various plant 
species. Oleosin nucleic acid sequences from such different species as Brassica, soybean, 
carrot, pine, and Arabidopsis tlialiana have also been cloned and determined to encode 
proteins associated with the phospholipid monolayer membrane of oil bodies in those plants. 
[011] It has also been determined that two phytohormones, gibberellic acid (GA) and 

absisic acid (ABA), are involved in overall regulatory processes in seed development (e.g. 
Ritchie & Gilroy, 1998, Plant Physiol. 116:765-776; Arenas-Huertero et al., 2000, Genes 
Dev. 14:2085-2096). Both the GA and ABA pathways are affected by okadaic acid, a protein 
phosphatase inhibitor (Kuo et al., 1996, Plant Cell. 8:259-269). The regulation of protein 
phosphorylation by kinases and phosphatases is accepted as a universal mechanism of 
cellular control (Cohen, 1992, Trends Biochem. Sci. 17:408-413). Likewise, the plant 
hormones ethylene (e.g. Zhou et al., 1998, Proc. Natl. Acad. Sci. USA 95:10294-10299; 
Beaudoin et al., 2000, Plant Cell 2000:1103-1115), and auxin (e.g. Colon-Carmona et al., 
2000, Plant Physiol. 124: 1728-1738) are involved in controlling plant development as well. 
[012] Although several compounds are known that generally affect plant and seed 

development, there is a clear need to specifically identify factors that are more specific for the 
developmental regulation of storage compound accumulation and to identify genes which 
have the capacity to confer altered or increased oil production to its host plant and to other 
plant species. This invention discloses a large number of nucleic acid sequences from 
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Arabidopsis thaliana, Brassica napus, and the moss Physcomitrella patens. These nucleic 
acid sequences can be used to alter or increase the levels of seed storage compounds such as 
proteins, sugars and oils, in plants, including transgenic plants, such as rapeseed, canola, 
linseed, soybean, sunflower maize, oat, rye, barley, wheat, pepper, tagetes, cotton, oil palm, 
coconut palm, flax, castor and peanut, which are oilseed plants containing high amounts of 
lipid compounds. 

SUMMARY OF THE INVENTION 

[013] The present invention provides novel isolated nucleic acid and amino acid 

sequences associated with the metabolism of seed storage compounds in plants. 
[014] The present invention also provides an isolated nucleic acid from Arabidopsis, 
Brassica, and Physcomitrella patens encoding a Lipid Metabolism Protein (LMP), or a 
portion thereof. These sequences may be used to modify or increase lipids and fatty acids, 
cofactors and enzymes in microorganisms and plants. 

[015] Arabidopsis plants are known to produce considerable amounts of fatty acids such as 
linoleic and linolenic acid (See, e.g., Table 2) and for their close similarity in many aspects 
(gene homology, etc.) to the oil crop plant Brassica. Therefore, nucleic acid molecules 
originating from a plant like Arabidopsis thaliana and Brassica napus aTe especially suited to 
modify the lipid and fatty acid metabolism in a host, especially in microorganisms and plants. 
Furthermore, nucleic acids from the plants Arabidopsis thaliana and Brassica napus can be 
used to identify those DNA sequences and enzymes in other species which are useful to 
modify the biosynthesis of precursor molecules of fatty acids in the respective organisms. 
[016] The present invention further provides an isolated nucleic acid comprising a 

fragment of at least 15 nucleotides of a nucleic acid from a plant {Arabidopsis thaliana, 
Brassica napus, or Physcomitrella patens) encoding a Lipid Metabolism Protein (LMP), or a 
portion thereof. 

[017] Also provided by the present invention are polypeptides encoded by the 

nucleic acids, heterologous polypeptides comprising polypeptides encoded by the nucleic 
acids, and antibodies to those polypeptides. 

[018] Additionally, the present invention relates to and provides the use of LMP 

nucleic acids in the production of transgenic plants having a modified level of a seed storage 
compound. A method of producing a transgenic plant with a modified level of a seed storage 
compound includes the steps of transforming a plant cell with an expression vector 
comprising a LMP nucleic acid, and generating a plant with a modified level of the seed 
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storage compound from the plant cell. In a preferred embodiment, the plant is an oil 
producing species selected from the group consisting of rapeseed, canola, linseed, soybean, 
sunflower, maize, oat, rye, barley, wheat, pepper, tagetes, cotton, oil palm, coconut palm, 
flax, castor, and peanut, for example. 

[019] According to the present invention, the compositions and methods described 

herein can be used to increase or decrease the level of an LMP in a transgenic plant 
comprising increasing or decreasing the expression of the LMP nucleic acid in the plant 
Increased or decreased expression of the LMP nucleic acid can be achieved through in vivo 
mutagenesis of the LMP nucleic acid. The present invention can also be used to increase or 
decrease the level of a lipid in a seed oil, to increase or decrease the level of a fatty acid in a 
seed oil, or to increase or decrease the level of a starch in a seed or plant. 
[020] Also included herein is a seed produced by a transgenic plant transformed by a 

LMP DNA sequence, wherein the seed contains the LMP DNA sequence and wherein the 
plant is true breeding for a modified level of a seed storage compound. The present invention 
additionally includes a seed oil produced by the aforementioned seed. 

[021] Further provided by the present invention are vectors comprising the nucleic 

acids, host cells containing ^the vectors, and descendent plant materials produced by 
transforming a plant cell with the nucleic acids and/or vectors. 

[022] According to the present invention, the compounds, compositions, and 

methods described herein can be used to increase or decrease the level of a lipid in a seed oil, 
or to increase or decrease the level of a fatty acid in a seed oil, or to increase or decrease the 
level of a starch or other carbohydrate in a seed or plant. A method of producing a higher or 
lower than normal or typical level of storage compound in a transgenic plant, comprises 
expressing a LMP nucleic acid from Arabidopsis thaliana, Brassica napus, and 
Physcomitrella patens in the transgenic plant, wherein the transgenic plant is Arabidopsis 
thaliana and Brassica napus, or a species different from Arabidopsis thaliana and Brassica 
napus. Also included herein are compositions and methods of the modification of the 
efficiency of production of a seed storage compound. As used herein, the phrase 
"Arabidopsis thaliana and Brassica napus" means Arabidopsis thaliana and/or Brassica 
napus. 

[023] Accordingly, the present invention provides novel isolated LMP nucleic acids 

and isolated LMP amino acid sequences from Arabidopsis thaliana, Brassica napus, and 
Physcomitrella patens, as well as active fragments, analogs and orthologs thereof. 
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[024] The present invention also provides transgenic plants having modified levels 

of seed storage compounds, and in particular, modified levels of a lipid, a fatt y acid » or a 
sugar. 

[025] The polynucleotides and polypeptides of the present invention, including 

agonists and/or fragments thereof, also have uses that include modulating plant growth, and 
potentially plant yield, preferably increasing plant growth under adverse conditions (drought, 
cold, light, UV). In addition, antagonists of the present invention may have uses that include 
modulating plant growth and/or yield, preferably through increasing plant growth and yield. 
In yet another embodiment, overexpression of the polypeptides of the present invention using 
a constitutive promoter (e.g., 35S or other promoters) may be useful for increasing plant yield 
under stress conditions (drought, light, cold, UV) by modulating light utilization efficiency. 
[026] The present invention also provides methods for producing such 

aforementioned transgenic plants. In another embodiment, the present invention provides 
seeds and seed oils from such aforementioned transgenic plants. 

[027] These and other embodiments, features, and advantages of the present 

invention will become apparent after a review of the following detailed description of the 
disclosed embodiments and the appended claims. 

DETAILED DESCRIPTION OF THE INVENTION 

[028] The present invention may be understood more readily by reference to the 

following detailed description of the preferred embodiments of the invention and the 
Examples included therein. 

[029] Before the present compounds, compositions, and methods are disclosed and 

described, it is to be understood that this invention is not limited to specific nucleic acids, 
specific polypeptides, specific cell types, specific host cells, specific conditions, or specific 
methods, etc., as such may, of course, vary, and the numerous modifications and variations 
therein will be apparent to those skilled in the art. It is also to be understood that the 
terminology used herein is for the purpose of describing particular embodiments only and is 
not intended to be limiting. As used in the specification and in the claims, "a" or "an" can 
mean one or more, depending upon the context in which it is used. Thus, for example, 
reference to "a cell" can mean that at least one cell can be utilized. 

[030] In accordance with the purpose(s) of this invention, as embodied and broadly 
described herein, this invention, in one aspect, provides an isolated nucleic acid from a plant 
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(Arabidopsis thaliana, Brassica napus, and PhyscomitreUa patens) encoding a Lipid 
Metabolism Protein (LMP), or a portion thereof. As used herein, the phrase "Arabidopsis 
thaliana, Brassica napus, and PhyscomitreUa patens" means Arabidopsis thaliana and/or 
Brassica napus and/or PhyscomitreUa patens. 

[0311 One aspect of the invention pertains to isolated nucleic acid molecules that encode 
LMP polypeptides or biologically active portions thereof, as well as nucleic acid fragments 
sufficient for use as hybridization probes or primers for the identification or amplification of 
an LMP-encoding nucleic acid (e.g., LMP DNA). As used herein, the terms "nucleic acid 
molecule" and "polynucleotide sequence" are used interchangeably and are intended to 
include DNA molecules (e.g., cDNA or genomic DNA) and RNA molecules (e.g., mRNA) 
and analogs of the DNA or RNA generated using nucleotide analogs. This term also 
encompasses untranslated sequence located at both the 3' and 5' ends of the coding region of 
a gene: at least about 1000 nucleotides of sequence upstream from the 5' end of the coding 
region and at least about 200 nucleotides of sequence downstream from the 3' end of the 
coding region of the gene. The nucleic acid molecule can be single-stranded or double- 
stranded, but preferably is double-stranded DNA. An "isolated" nucleic acid molecule is one 
which is substantially separated from other nucleic acid molecules which are present in the 
natural source of the nucleic acid. Preferably, an "isolated" nucleic acid is substantially free 
of sequences which naturally flank the nucleic acid (i.e., sequences located at the 5' and 3' 
ends of the nucleic acid) in the genomic DNA of the organism from which the nucleic acid is 
derived. For example, in various embodiments, the isolated LMP nucleic acid molecule can 
contain less 1han about 5 kb, 4kb, 3kb, 2kb, 1 kb, 0.5 kb, or 0.1 kb of nucleotide sequences 
which naturally flank the nucleic acid molecule in genomic DNA of the cell from which the 
nucleic acid is derived (e.g., an Arabidopsis thaliana or Brassica napus cell). Moreover, an 
"isolated" nucleic acid molecule, such as a cDNA molecule, can be substantially free of other 
cellular material, or culture medium when produced by recombinant techniques, or chemical 
precursors, or other chemicals when chemically synthesized. 

[032] A nucleic acid molecule of the present invention, e.g., a nucleic acid molecule 

having a polynucleotide sequence of Appendix A (i.e. the polynucleotide sequence of SEQ 
ID NO:l, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9, SEQ ID NO: 11, SEQ 
IDNO:13, SEQ ID NO:15, SEQ IDNO:17, SEQ ID NO:19, SEQ ID NO:21, SEQ ID NO:23, 
SEQ ID NO:25, SEQ ID NO:27, SEQ ID NO:29, SEQ ID NO:31, SEQ ID NO:33, SEQ ID 
NO:35, SEQ ID NO:37, SEQ ID NO:39, SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, 
SEQ ID NO:47, SEQ ID NO:51, SEQ ID NO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ ID 
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NO:59, SEQ ID NO:61, SEQ ID NO:63, SEQ ID NO:65, SEQ ID NO:67, SEQ ID NO:69, 
SEQ ID NO:71, SEQ ID NO:73, SEQ ID NO:75, SEQ ID NO:77, SEQ ID NO:79, or SEQ ID 
NO:81, or a portion thereof, can be isolated using standard molecular biology techniques and 
the sequence information provided herein. For example, an Arabidopsis thaliana, Brassica 
napus, or Physcomitrella patens IMP cDNA can be isolated from an Arabidopsis thaliana, 
Brassica napus, or Physcomitrella patens library using all or portion of one of the 
polynucleotide sequences of Appendix A as a hybridization probe and standard hybridization 
techniques (e.g., as described in Sambrook et ai, 1989, Molecular Cloning: A Laboratory 
Manual. 2nd, ed., Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, 
Cold Spring Harbor, NY). Moreover, a nucleic acid molecule encompassing all or a portion 
of one of the polynucleotide sequences of Appendix A can be isolated by the polymerase 
chain reaction using oligonucleotide primers designed based upon this sequence (e.g., a 
nucleic acid molecule encompassing all or a portion of one of the sequences of Appendix A 
can be isolated by the polymerase chain reaction using oligonucleotide primers designed 
based upon this same sequence of Appendix A). For example, mRNA can be isolated from 
plant cells (e.g., by the guanidinium-thiocyanate extraction procedure of Chirgwin et al., 
1979, Biochemistry 18:5294-5299) and cDNA can be prepared using reverse transcriptase 
(e.g.,' Moloney MLV reverse transcriptase, available from Gibco/BRL, Bethesda, MD; or 
AMV reverse transcriptase, available from Seikagaku America, Inc., St. Petersburg, FL). 
Synlhetic oligonucleotide primers for polymerase chain reaction amplification can be 
designed based upon one of the polynucleotide sequences shown in Appendix A. A nucleic 
acid of the invention can be amplified using cDNA or, alternatively, genomic DNA, as a 
template and appropriate oligonucleotide primers according to standard PCR amplification 
techniques. The nucleic acid so amplified can be cloned into an appropriate vector and 
characterized by DNA sequence analysis. Furthermore, oligonucleotides corresponding to a 
LMP nucleotide sequence can be prepared by standard synthetic techniques, e.g., using an 
automated DNA synthesizer. 

[033] In a preferred embodiment, an isolated nucleic acid of the invention comprises 

one of the polynucleotide sequences shown in Appendix A (i.e. SEQ ID NO:l, SEQ ID 
NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9, SEQ ID NO: 11, SEQ ID NO:13, SEQ 
ID NO:15, SEQ ID NO:17, SEQ ID NO:19, SEQ ID NO:21, SEQ ID NO:23, SEQ ID NO:25, 
SEQ ID NO:27, SEQ ID NO:29, SEQ ID NO:31, SEQ ID NO:33, SEQ ID NO:35, SEQ ID 
NO:37, SEQ ID NO:39, SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, 
SEQ ID NO:51, SEQ YD NO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ ID NO:59, SEQ ID 



10 



WO 2004/013304 ^CT/US2003/024364 

NO:61, SEQ ID NO:63, SEQ ID NO:65, SEQ ID NO:67, SEQ ID NO:69, SEQ ID NO:71, 
SEQ ID NO:73, SEQ ID NO:75, SEQ ID NO:77, SEQ ID NO:79, or SEQ ID NO:81). These 
polynucleotides of Appendix A correspond to the Arabidopsis thaliana, Brassica napus, and 
Physcomitrella patens IMP cDNAs of the invention. These cDNAs comprise sequences 
encoding LMPs (i.e., the "coding region" or open reading frame (ORF)), as well as 5' 
untranslated sequences and 3' untranslated sequences. Alternatively, the nucleic acid 
molecules can comprise only the coding region of any of the polynucleotide sequences 
described herein or can contain whole genomic fragments isolated from genomic DNA. 
[034] For the purposes of this application, it will be understood that each of the 

polynucleotide sequences set forth in Appendix A has an identifying entry number (e.g., 
Pkl23). Each of these sequences may generally comprise three parts: a 5' upstream region, a 
coding region, and a downstream region. The particular polynucleotide sequences shown in 
Appendix A represent the coding region or open reading frame, and the putative functions of 
the encoded polypeptides are indicated in Table 3. 

Table 3 
Putative LMP Functions 

— — " SEQ ID NO: 

Sequence Function 

code 



Pkl23 Gibberellin-regulated protein GAS A3 precursor 1 

Pkl97 Tyrosine aminotransferase 

Pkl36 D-hydroxy-fatty acid dehydrogenase 

Pkl56 Serine protease 

Pkl59 Nonspecific lipid-transfer protein 

Pkl79 Signal transduction protein 

Pk202 Lipid transfer - like protein 

Pk206 bZIP transcription factor 15 

17 

Pk207 Acyl-CoA dehydrogenase 

19 

Pk209 Pyruvate kinase 

Pk215 Phosphatidylglycerotransferase 21 

Pk239 Digalactosyldiacylglycerol synthase 23 

Pk240 Phosphatidate cytidyltransferase 25 

Pk241 AT Psbs protein 27 
Pk242 Omega-6 fatty acid desaturase, endoplasmic reticulum (FAD2) 29 
BnOl 1 Gibberellin 3-beta hydroxylase with +4 G 31 
Bn077 Zinc finger DNA binding protein 33 



5 
7 
9 
11 
13 
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JbOO 1 Gibberellin 20-oxidase 

Jb002 Seed maturation protein 

Jb003 Beta-VPE Vacuolar Processing Enzym 

Jb005 Very-long-chain fatty acid condensing enzyme CUT1 

Jb007 Glucokinase 

Jb009 Glutathione S-transf erase TSI- 1 

JbOl 3 ABA-regulated gene 

Jb017 Cysteine proteinase 

Jb024 Pectinesterase-like protein 

Jb027 Signal transduction protein 

OO- 1 Aldose reductase-like protein 

002 Dormancy related protein 

00-3 HSP associated protein like 

00-4 Poly (ADP-ribose) polymerase 

00-5 Transitional endoplasmic reticulum ATPase 

00-6 Beta coat like protein 

00-8 Protein disulfide-isomerase 

00-9 Signal transduction protein/Apoptosis inhibitor 

OO-10 Annexin 

OO- 1 1 Putative oxidoreductase 

00-12 Long chain ale dehydrogenase/ oxidoreductase 
pp82 jTranscription factor 

Pk225 Amino-cyclopropane-carboxylic acid oxidase 



35 

37 

39 

41 

43 

45 

47 

51 

53 

55 

57 

59 

61 

63 

65 

67 

69 

71 

73 

75 

77 

79 

81 



Table 4 

Grouping of LMPs based on Functional protein domains 



Functional 
category 


SEQ 
ID: 


SEQ 
Code: 


Functional domain 


Domain 
position 


DNA-binding 
proteins 


1 


Pkl23 


Zinc finger 


66-86 
29-71 




15 


Pk206 


bZBP transcription factor (PFAM) 
Leucine zipper 


144-197 
179-209 




27 


Pk241 


DNA-binding domain 
Histone H5 signature 


207-221 
57-71 




33 


Bn077 


Zinc ringer (BRCT; PARP) 

Ethylene responsive element binding protein 


64-104 
79-99 




63 


004 


Zinc finger 
Leucine zipper 


760-805 
114-117 




73 


OO-10 


Zinc finger 

Yeast DNA-binding domain 


220-230 
207-217 
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79 


pp82 J 




19-119 


Kinases 


43 


Jb007 ( 


Glucokinase 


173-206 




45 


Jb009 


Deoxynucleoside kinase 


99-139 




19 


Pk209 


Pyruvate kinase (PFAM) 


1-326 




61 


003 


Galactokinase ' 


285-296 


Signal 

Transduction 


67 


00-6 


Wnt-1 domain 
WSC domain 


607-655 " 
527-548 




71 


009 


BIR repeat (inhibitor of apoptosis) 
Wnt-1 domain 


47-85 
43-91 




41 


Jb005 


Wnt-1 domain 


23-71 




47 


Jb013 


Wnt-1 domain 


23-91 




55 


Jb027 


Emp24/gp25L intracellular vesicle trafficking 
Wnt-1 domain 


2-204 
135-183 




11 


Pkl79 


Wnt-1 domain 

PDZ domain (Wnt signalling) 


279-327 
205-299 




3 


Pkl97 


Wnt-1 domain 


300-348 


Proteases 


7 


Pkl56 


Serine protease 
Prolyl aminopeptidase 


171-191 
128-139 




37 


Jb002 


Peptidase family M23/M37 


404-444 




39 


Jb003 


Cysteine protease 
Peptidase C13 (PFAM) 


52-76 
10-367 




51 


Jb017 


Cysteine protease CI 
Peptidase CI (PFAM) 


163-178 
145-361 




65 


005 


Peptidase family M41 

AAA ATPase molecular chaperone (PFAM) 


343-387 
620-664 

243-427 


Lipid 

metabolism 


5 


Pkl36 


L/-riyoroxy-raiiy aciu ucnyui ugciiaac 


94-143 




9 


Pkl59 


Lipid Transfer Protein LTP (PFAM) 


29-117 




13 


Pk202 


Lipid Transfer Protein LTP (PFAM) 


38-103 




17 


Pk207 


Acyl-CoA dehydrogenase 
Iron-containing aiconoi aenyarogciw&c 


2-44 
97-112 




21 


Pk215 


CDP-alcohol phosphatidyltransferase (PFAM) 


172-309 




23 


Pk239 


Glycosyl (galactosyl) transferase (PFAM) 


572-674 




25 


Pk240 


Pnospnatiaate cyuayitransierase 


343-370 




29 


Pk242 


Fatty acid desaturase (PFAM) 


32-376 


Oxido- 
reductases 


31 


BnOll 


Iron Ascorbate oxidoreductase (PFAM) 


" 43-343 




35 


JbOOl 


Respiratory chain NADH dehydrogenase 
Iron Ascorbate oxidoreductase (PFAM) 


95-123 
54-369 




53 


Jb024 


Multicopper oxidase 
Copper-oxidase (PFAM) 


" 216-247 
123-145 

154-306 




57 


OO-l 


Aldo/keto reductase family (PFAM) 


18-294 
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59 


002 


Alcohol dehydrogenase (PFAM) 


38-228 ~~ 




69 


00-8 


Thioredoxin (PFAM) 


22-250 




75 


OO-ll 


Alcohol dehydrogenase (PFAM) 


50-234 




77 


00-12 


Zinc alcohol dehydrogenase(PFAM) 


20-329 




81 


Pk225 


Iron Ascorbate oxidoreductase (PFAM) 


3-297 



[0351 In another preferred embodiment, an isolated nucleic acid molecule of the 

present invention encodes a polypeptide that is able to participate in the metabolism of seed 
storage compounds such as lipids, starch, and seed storage proteins, and that contains a DNA- 
binding (or transcription factor) domain, a protein kinase domain, a signal transduction 
domain, a protease domain, a lipid metabolism domain, or an oxidoreductase domain. 
Examples of isolated nucleic acids that encode LMPs containing such domains can be found 
in Table 4. Examples of nucleic acids encoding LMPs containing a DNA-binding domain 
include those shown in SEQ ID NO:l, SEQ ID NO:15, SEQ ID NO:27, SEQ ID NO:33, SEQ 
ID NO:63, SEQ ID NO:73, and SEQ ID NO:79. Examples of nucleic acids encoding LMPs 
containing a protein kinase domain include those shown in SEQ ID NO:19, SEQ ID NO:43, 
SEQ ID NO:45, and SEQ ID NO:61. Examples of nucleic acids encoding LMPs containing a 
signal transduction domain include those shown in SEQ ID NO:3, SEQ ID NO:ll, SEQ ID 
NO:41, SEQ ID NO:47, SEQ ID NO:55, SEQ ED NO:67, and SEQ ID NO:71. Examples of 
nucleic acids encoding LMPs containing a protease domain include those shown in SEQ ID 
NO:7, SEQ ID NO:37, SEQ ID NO:39, SEQ ID NO:51, and SEQ ID NO:65. Examples of 
nucleic acids encoding LMPs containing a lipid metabolism domain include those shown in 
SEQ ID NO:5, SEQ ID NO:9, SEQ ID NO:13, SEQ ID NO:17, SEQ ID NO:21, SEQ ID 
NO:23, SEQ ED NO:25, and SEQ ID NO:29. Examples of nucleic acids encoding LMPs 
containing a oxidoreductase domain include those shown in SEQ ID NO:31, SEQ ID NO:35, 
SEQ ID NO:53, SEQ ID NO:57, SEQ ID NO:59, SEQ ID NO:69, SEQ ID NO:75, SEQ ID 
NO:77, and SEQ ID NO:81. 

[036] In another preferred embodiment, an isolated nucleic acid molecule of the 

invention comprises a nucleic acid molecule, which is a complement of one of the 
polynucleotide sequences shown in Appendix A (i.e. SEQ ID NO:l, SEQ ID NO:3, SEQ ID 
NO:5, SEQ ID NO:7, SEQ ID NO:9, SEQ ID NO:ll, SEQ ED NO:13, SEQ ID NO:15, SEQ 
ID NO:17, SEQ ID NO:19, SEQ ID NO:21, SEQ ID NO:23, SEQ ID NO:25, SEQ ID NO:27, 
SEQ ID NO:29, SEQ ID NO:31, SEQ ID NO:33, SEQ ID NO:35, SEQ ID NO:37, SEQ ID 
NO:39, SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ ID NO:51, 
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SEQ ID NO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ ID NO:59, SEQ ID NO:61, SEQ ID 
NO:63, SEQ ID NO:65, SEQ ID NO:67, SEQ ID NO:69, SEQ ID NO:71, SEQ ID NO:73, 
SEQ ID NO:75, SEQ ID NO:77, SEQ ID NO:79, or SEQ ID NO:81), or a portion thereof. A 
nucleic acid molecule which is complementary to one of the polynucleotide sequences shown 
in Appendix A is one which is sufficiently complementary to one of the polynucleotide 
sequences shown in Appendix A such that it can hybridize to one of the nucleotide sequences 
shown in Appendix A, thereby forming a stable duplex. 

[037] In another preferred embodiment, an isolated nucleic acid of the invention 

comprises a polynucleotide sequence encoding a polypeptide selected from the group 
consisting of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, 
SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:16, SEQ ED NO:18, SEQ ID NO:20, SEQ ID 
NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ID NO:32, 
SEQ ID NO:34, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID 
NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, 
SEQ ID NO:58, SEQ ED NO:60, SEQ ID NO:62, SEQ ID NO:64, SEQ DD NO:66, SEQ ID 
NO:68, SEQ ID NO:70, SEQ ID NO:72, SEQ ID NO:74, SEQ ID NO:76, SEQ ID NO:78, 
SEQ ID NO:80, or SEQ ID NO:82. 

[038] In still another preferred embodiment, an isolated nucleic acid molecule of the 

invention comprises a polynucleotide sequence which is at least about 50-60%, preferably at 
least about 60-70%, more preferably at least about 70-80%, 80-90%, or 90-95%, and even 
more preferably at least about 95%, 96%, 97%, 98%, 99%, or more homologous to a 
polynucleotide sequence shown in Appendix A, or a portion thereof. In an additional 
preferred embodiment, an isolated nucleic acid molecule of the invention comprises a 
polynucleotide sequence which hybridizes, e.g., hybridizes under stringent conditions, to one 
of the polynucleotide sequences shown in Appendix A or a portion thereof. These stringent 
conditions include washing with a solution having a salt concentration of about 0.02 M at pH 
7 and about 60°C. In another embodiment, the stringent conditions comprise an initial 
hybridization in a 6X sodium chloride/sodium citrate (6X SSC) solution at 65°C. 
[039] Moreover, the nucleic acid molecule of the invention can comprise only a 

portion of the coding region of one of the sequences in Appendix A, for example a fragment 
which can be used as a probe or primer or a fragment encoding a biologically active portion 
of a LMP. The polynucleotide sequences determined from the cloning of the LMP genes 
from Arabidopsis thaliana, Brassica napus, and Physcomitrella patens allows for the 



15 



WO 2004/013304 ^fe:T/US2003/024364 

generation of probes and primers designed for use in identifying and/or cloning LMP 
homologies in other cell types and organisms, as well as LMP homologues from other plants 
or related species. Therefore this invention also provides compounds comprising the nucleic 
acids disclosed herein, or fragments thereof. These compounds include the nucleic acids 
attached to a moiety. These moieties include, but are not limited to, detection moieties, 
hybridization moieties, purification moieties, delivery moieties, reaction moieties, binding 
moieties, and the like. The probe/primer typically comprises substantially purified 
oligonucleotide. The oligonucleotide typically comprises a region of nucleotide sequence that 
hybridizes under stringent conditions to at least about 12, preferably about 25, more 
preferably about 40, 50, or 75 consecutive nucleotides of a sense strand of one of the 
sequences set forth in Appendix A, an anti-sense sequence of one of the sequences set forth 
in Appendix A, or naturally occurring mutants thereof. Primers based on a polynucleotide 
sequence of Appendix A can be used in PCR reactions to clone LMP homologues. Probes 
based on the LMP nucleotide sequences can be used to detect transcripts or genomic 
sequences encoding the same or homologous proteins. In preferred embodiments, the probe 
further comprises a label group attached fliereto, e.g. the label group can be a radioisotope, a 
fluorescent compound, an enzyme, or an enzyme co-factor. Such probes can be used as a 
part of a genomic marker test kit for identifying cells which express a LMP, such as by 
measuring a level of a LMP-encoding nucleic acid in a sample of cells, e.g., detecting LMP 
mRNA levels or determining whether a genomic LMP gene has been mutated or deleted. 
[040] In one embodiment, the nucleic acid molecule of the invention encodes a 

protein or portion thereof which includes an amino acid sequence which is sufficiently 
homologous to an amino acid encoded by a sequence of Appendix A such that the protein or 
portion thereof maintains the same or a similar function as the wild-type protein. As used 
herein, the language "sufficiently homologous" refers to proteins or portions thereof which 
have amino acid sequences which include a nunimum number of identical or equivalent 
amino acid residues to an amino acid sequence such that the protein or portion thereof is able 
to participate in the metabolism of compounds necessary for the production of seed storage 
compounds in plants, construction of cellular membranes in microorganisms or plants, or in 
the transport of molecules across these membranes. As used herein, an "equivalent" amino 
acid residue is, for example., an amino acid residue which has a similar side chain as a 
particular amino acid residue that is encoded by a polynucleotide sequence of Appendix A. 
Regulatory proteins, such as DNA binding proteins, transcription factors, kinases, 
phosphatases, or protein members of metabolic pathways such as the lipid, starch and protein 
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biosyntheuc pathways, or membrane transport systems, may play a role in the biosynthesis of 
seed storage compounds. Examples of such activities are described herein (see putative 
annotations in Table 3). Examples of LMP-encoding nucleic acid sequences are set forth in 
Appendix A. 

[041] As altered or increased sugar and/or fatty acid production is a general trait wished to 
be inherited into a wide variety of plants like maize, wheat, rye, oat, triticale, rice, barley, 
soybean, peanut, cotton, rapeseed, canola, manihot, pepper, sunflower and tagetes, 
solanaceous plants like potato, tobacco, eggplant, and tomato, Vicia species, pea, alfalfa, 
bushy plants (coffee, cacao, tea), Salix species, trees (oil palm, coconut), perennial grasses, 
and forage crops, these crop plants are also preferred target plants for genetic engineering as 
one further embodiment of the present invention. As used herein, a "forage crop" includes, 
but is not limited to, Wheatgrass, Canarygrass, Bromegrass, Wildrye Grass, Bluegrass, 
Orchardgrass, Alfalfa, Salfoin, Birdsfoot Trefoil, Alsike Clover, Red Clover, and Sweet 
Clover. 

[0421 Portions of proteins encoded by the LMP nucleic acid molecules of the 

invention are preferably biologically active portions of one of the LMPs. As used herein, the 
term "biologically active portion of a LMP" is intended to include a portion, e.g., a 
domain/motif, of a LMP that participates in the metabolism of compounds necessary for the 
biosynthesis of seed storage lipids, or the construction of cellular membranes in 
microorganisms or plants, or in the transport of molecules across these membranes, or has an 
activity as set forth in Table 3. To determine whether a LMP or a biologically active portion 
thereof can participate in the metabolism of compounds necessary for the production of seed 
storage compounds and cellular membranes, an assay of enzymatic activity may be 
performed. Such assay methods are well known to those skilled in the art, and as described in 
Example 14 of the Exemplification. 

[043] Biologically active portions of a LMP include peptides comprising amino acid 

sequences derived from the amino acid sequence of a LMP (e.g., an amino acid sequence 
encoded by a nucleic acid sequence of Appendix A (i.e. SEQ ID NO:l, SEQ ID NO:3, SEQ 
ID NO:5, SEQ ID NO:7, SEQ ID NO:9, SEQ ID NO:ll, SEQ ID NO:13, SEQ ID NO:15, 
SEQ ID NO:17, SEQ ID NO:19, SEQ ID NO:2l, SEQ ID NO:23, SEQ ID NO:25, SEQ ID 
NO:27, SEQ ID NO:29, SEQ ID NO:31, SEQ ID NO:33, SEQ ID NO:35, SEQ ID NO:37, 
SEQ ID NO:39, SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ ID 
NO:51, SEQ ID NO:53, SEQ ID NO:55, SEQ ED NO:57, SEQ ID NO:59, SEQ ID NO:61, 
SEQ ID NO:63, SEQ ED NO:65, SEQ EO NO:67, SEQ ED NO:69, SEQ ED NO:71, SEQ ED 
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NO:73, SEQ ID NO:75, SEQ ID NO:77, SEQ ID NO:79, or SEQ ID NO:81) or the amino 
acid sequence of a protein homologous to an IMP, which include fewer amino acids than a 
full length LMP or me foil length protein which is homologous to an LMP) and exhibit at 
least one activity of an LMP. Typically, biologically active portions (peptides, e.g., peptides 
which are, for example, 5, 10, 15, 20, 30, 35, 36, 37, 38, 39, 40, 50, 100, or more amino acids 
in length) comprise a domain or motif with at least one activity of a LMP. Moreover, other 
biologically active portions, in which other regions of the protein are deleted, can be prepared 
by recombinant techniques and evaluated for one or more of the activities described herein. 
Preferably, the biologically active portions of a LMP include one or more selected 
domains/motifs or portions thereof having biological activity. 

[044] Additional nucleic acid fragments encoding biologically active portions of a 

LMP can be prepared by isolating a portion of one of the sequences, expressing the encoded 
portion of the LMP or peptide (e.g., by recombinant expression in vitro) and assessing the 
activity of the encoded portion of the LMP or peptide. 

[045] The invention further encompasses nucleic acid molecules that differ from one 

of the polynucleotide sequences shown in Appendix A (i.e. SEQ ID NO:l, SEQ ID NO:3, 
SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9, SEQ ID NO:ll, SEQ ID NO:13, SEQ ID 
NO:15, SEQ ID NO:17, SEQ ID NO:19, SEQ ID NO:21, SEQ ID NO:23, SEQ ID NO:25, 
SEQ ID NO:27, SEQ ID NO:29, SEQ ID NO:31, SEQ ID NO:33, SEQ ID NO:35, SEQ ID 
NO:37, SEQ ID NO:39, SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, 
SEQ ID NO:51, SEQ ID NO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ ID NO:59, SEQ ID 
NO:61, SEQ ID NO:63, SEQ ID NO:65, SEQ ID NO:67, SEQ ID NO:69, SEQ ID NO:71, 
SEQ ID NO:73, SEQ ID NO:75, SEQ ID NO:77, SEQ ID NO:79, or SEQ ID NO:81), and 
portions thereof) due to degeneracy of the genetic code and thus encode the same LMP as 
that encoded by the polynucleotide sequences shown in Appendix A. In a further 
embodiment, the nucleic acid molecule of the invention encodes a full length protein which is 
substantially homologous to an amino acid sequence shown in Appendix A. In one 
embodiment, the full-length nucleic acid or protein or fragment of the nucleic acid or protein 
is fcom Arabidopsis thaliana, Brassica napus, and Physcomitrella patens. 
[046] In addition to the Arabidopsis thaliana, Brassica napus, and Physcomitrella 

patens LMP polynucleotide sequences described herein, it will be appreciated by those 
skilled in the art that DNA sequence polymorphisms mat lead to changes in the amino acid 
sequences of LMPs may exist within a population (e.g., the Arabidopsis thaliana, and 
Brassica napus, and Physcomitrella patens population). Such genetic polymorphism in the 
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LMP gene may exist among individuals within a population due to natural variation. As used 
herein, the terms "gene" and "recombinant gene" refer to nucleic acid molecules comprising 
an open reading frame encoding a LMP, preferably an Arabidopsis thaliana, Brassica napus, 
or Physcomitrella patens LMP. Such natural variations can typically result in 1-40% 
variance in the nucleotide sequence of the LMP gene. Any and all such nucleotide variations 
and resulting amino acid polymorphisms in LMP that are the result of natural variation and 
that do not alter the functional activity of LMPs are intended to be within the scope of the 
invention. 

[047] Nucleic acid molecules corresponding to natural variants and noa-Arabidopsis 

thaliana and Brassica napus orthologs of the Arabidopsis thaliana, Brassica napus, and 
Physcomitrella patens LMP cDNA of the invention can be isolated based on their homology 
to Arabidopsis thaliana, Brassica napus, and Physcomitrella patens LMP nucleic acid 
disclosed herein using the Arabidopsis thaliana, Brassica napus, and Physcomitrella patens 
cDNA, or a portion thereof, as a hybridization probe according to standard hybridization 
techniques under stringent hybridization conditions. As used herein, the term "orthologs" 
refers to two nucleic acids from different species, but that have evolved from a common 
ancestral gene by speciation. Normally, orthologs encode proteins having the same or similar 
functions. Accordingly, in another embodiment, an isolated nucleic acid molecule of the 
invention is at least 15 nucleotides in length and hybridizes under stringent conditions to the 
nucleic acid molecule comprising a polynucleotide sequence shown in Appendix A. In other 
embodiments, the nucleic acid is at least 30, 50, 100, 250, or more nucleotides in length. As 
used herein, the term "hybridizes under stringent conditions" is intended to describe 
conditions for hybridization and washing under which nucleotide sequences at least 60% 
homologous to each other typically remain hybridized to each other. Preferably, the 
conditions are such that sequences at least about 65%, more preferably at least about 70%, 
and even more preferably at least about 75%, or more homologous to each other typically 
remain hybridized to each other. Such stringent conditions are known to those skilled in the 
art and can be found in Current Protocols in Molecular Biology, John Wiley & Sons, N.Y. 
(1989), 6.3.1-6.3.6. A preferred, non-limiting example of stringent hybridization conditions 
are hybridization in 6X sodium chloride/sodium citrate (SSC) at about 45°C, followed by one 
or more washes in 0.2 X SSC, 0.1% SDS at 50-65C. In another embodiment, the stringent 
conditions comprise an initial hybridization in a 6X sodium chloride/sodium citrate (6X SSC) 
solution at 65°C. Preferably, an isolated nucleic acid molecule of the invention that 
hybridizes under stringent conditions to a polynucleotide sequence of Appendix A (i.e. SEQ 
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ED NO:l, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9, SEQ ID N0:1 1. SEQ 
ID NO:13, SEQ ID NO:15, SEQ ID NO:17, SEQ ID NO:19, SEQ ID N0:21, SEQ ID NO:23, 
SEQ ID NO:25, SEQ ID NO:27, SEQ ID NO:29, SEQ ID N0:31, SEQ ID NO:33, SEQ ID 
NO:35, SEQ ED NO:37, SEQ ID NO:39, SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, 
SEQ ID NO:47, SEQ ID NO:51, SEQ ID NO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ ID 
NO:59, SEQ ID NO:61, SEQ ID NO:63, SEQ ID NO:65, SEQ ID NO:67, SEQ ID NO:69, 
SEQ ID N0:71, SEQ ID NO:73, SEQ ID NO:75, SEQ ID NO:77, SEQ ID NO:79, or SEQ ID 
NO:81) corresponds to a naturally occurring nucleic acid molecule. As used herein, a 
"naturally-occurring" nucleic acid molecule refers to an RNA or DNA molecule having a 
polynucleotide sequence that occurs in nature (e.g., encodes a natural protein). In one 
embodiment, the nucleic acid encodes a natural Arabidopsis thaliana, Brassica napus, or 
Physcomitrella patens LMP. 

[048] In addition to naturally-occurring variants of the LMP sequence that may exist 

in the population, the skilled artisan will further appreciate that changes can be introduced by 
mutation into a polynucleotide sequence of Appendix A, thereby leading to changes in the 
amino acid sequence of the encoded LMP, without altering the functional ability of the LMP. 
For example, nucleotide substitutions leading to amino acid substitutions at "non-essential" 
amino acid residues can be made in a polynucleotide sequence of Appendix A. A "non- 
essential" amino acid residue is a residue that can be altered from the wild-type sequence of 
one of the LMPs (Appendix A) without altering the activity of said LMP, whereas an 
"essential" amino acid residue is required for LMP activity. Other amino acid residues, 
however, (e.g., those that are not conserved or only semi-conserved in the domain having 
LMP activity) may not be essential for activity and thus are likely to be amenable to 
alteration without altering LMP activity. 

[049] Accordingly, another aspect of the invention pertains to nucleic acid molecules 

encoding LMPs that contain changes in amino acid residues that are not essential for LMP 
activity. Such LMPs differ in amino acid sequence from a sequence yet retain at least one of 
the LMP activities described herein. In one embodiment, the isolated nucleic acid molecule 
comprises a nucleotide sequence encoding a protein, wherein the protein comprises an amino 
acid sequence at least about 50% homologous to an amino acid sequence encoded by a 
nucleic acid of Appendix A and is capable of participation in the metabolism of compounds 
necessary for the production of seed storage compounds in Arabidopsis tlialiana, Brassica 
napus, and Physcomitrella patens, or cellular membranes, or has one or more activities set 
forth in Table 3. Preferably, the protein encoded by the nucleic acid molecule is at least 
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about 50-60% homologous to one of the sequences encoded by a nucleic acid of Appendix A 
(i.e. SEQ ID NO:l, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9, SEQ ID 
NO:ll, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:17, SEQ ID NO:19, SEQ ID NO:21, 
SEQ ID NO:23, SEQ ID NO:25, SEQ ID NO:27, SEQ ID NO:29, SEQ H> NO:31, SEQ ID 
NO:33, SEQ ID NO:35, SEQ ID NO:37, SEQ ID NO:39, SEQ ID NO:41, SEQ ID NO:43, 
SEQ ED NO:45, SEQ ID NO:47, SEQ ID NO:51, SEQ ID NO:53, SEQ ID NO:55, SEQ ID 
NO:57, SEQ ID NO:59, SEQ ID NO:61, SEQ ID NO:63, SEQ ID NO:65, SEQ ID NO:67, 
SEQ ID NO:69, SEQ ID NO:71, SEQ ID NO:73, SEQ ID NO:75, SEQ ID NO:77, SEQ ID 
NO:79, or SEQ ID NO:81), more preferably at least about 60-70% homologous to one of the 
sequences encoded by a nucleic acid of Appendix A, even more preferably at least about 70- 
80%, 80-90%, or 90-95% homologous to one of the sequences encoded by a nucleic acid of 
Appendix A and most preferably at least about 96%, 97%, 98%, or 99% homologous to one 
of the sequences encoded by a nucleic acid of Appendix A. 

[0501 To determine the percent homology of two amino acid sequences (e.g., one of 

Ihe sequences encoded by a nucleic acid of Appendix A and a mutant form thereof) or of two 
nucleic acids, the sequences are aligned for optimal comparison purposes (e.g., gaps can be 
introduced in the sequence of one protein or nucleic acid for optimal alignment with the other 
protein or nucleic acid). The amino acid residues or nucleotides at corresponding amino acid 
positions or nucleotide positions are then compared. When a position in one sequence (e.g., 
one of the sequences encoded by a nucleic acid of Appendix A) is occupied by the same 
amino acid residue or nucleotide as the corresponding position in the other sequence (e.g., a 
mutant form of the sequence encoded by a nucleic acid of Appendix A), then the molecules 
are homologous at that position (i.e., as used herein amino acid or nucleic acid "homology" is 
equivalent to amino acid or nucleic acid "identity"). The percent homology between the two 
sequences is a function of the number of identical positions shared by the sequences (i.e., % 
homology = numbers of identical positions/total numbers of positions x 100). 
[051] An isolated nucleic acid molecule encoding a IMP homologous to a protein 

sequence encoded by a nucleic acid of Appendix A can be created by introducing one or 
more nucleotide substitutions, additions, or deletions into a polynucleotide sequence of 
Appendix A such that one or more amino acid substitutions, additions, or deletions are 
introduced into the encoded protein. Mutations can be introduced into one of the sequences 
of Appendix A by standard techniques, such as site-directed mutagenesis and PCR-mediated 
mutagenesis. Preferably, conservative amino acid substitutions are made at one or more 
predicted non-essential amino acid residues. A "conservative amino acid substitution" is one 

21 



WO 2004/013304 ^PCT/US2003/024364 

in which the amino acid residue is replaced with an amino acid residue having a similar side 
chain. Families of amino acid residues having similar side chains have been defined in the 
art. These families include amino acids with basic side chains (e.g., lysine, arginine, 
histidine), acidic side chains (e.g., aspartic acid, glutamic acid), uncharged polar side chains 
(e.g., glycine, asparagine, glutamine, serine, threonine, tyrosine, cysteine), nonpolar side 
chains (e.g., alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, 
tryptophan), beta-branched side chains (e.g., threonine, valine, isoleucine), and aromatic side 
chains (e.g., tyrosine, phenylalanine, tryptophan, histidine). Thus, a predicted non-essential 
amino acid residue in a LMP is preferably replaced with another amino acid residue from the 
same side chain family. Alternatively, in anolher embodiment, mutations can be introduced 
randomly along all or part of a LMP coding sequence, such as by saturation mutagenesis, and 
the resultant mutants can be screened for a LMP activity described herein to identify mutants 
that retain LMP activity. Following mutagenesis of one of the sequences of Appendix A, the 
encoded protein can be expressed recombinant and the activity of the protein can be 
determined using, for example, assays described herein (see Examples 13-14 of the 
Exemplification). 

[052] LMPs are preferably produced by recombinant DNA techniques. For example, 

a nucleic acid molecule encoding the protein is cloned into an expression vector (as described 
above), the expression vector is introduced into a host cell (as described herein), and the LMP 
is expressed in the host cell. The LMP can then be isolated from the cells by an appropriate 
purification scheme using standard protein purification techniques. Alternative to 
recombinant expression, a LMP or peptide thereof can be synthesized chemically using 
standard peptide synthesis techniques. Moreover, native LMP can be isolated from cells, for 
example using an anti-LMP antibody, which can be produced by standard techniques 
utilizing a LMP or fragment thereof of this invention. 

[053] The invention also provides LMP chimeric or fusion proteins. As used herein, 

a LMP "chimeric protein" or "fusion protein" comprises a LMP polypeptide operatively 
linked to a non-LMP polypeptide. An "LMP polypeptide" refers to a polypeptide having an 
amino acid sequence corresponding to a LMP, whereas a "non-LMP polypeptide" refers to a 
polypeptide having an amino acid sequence corresponding to a protein which is not 
substantially homologous to the LMP, e.g., a protein which is different from the LMP and 
which is derived from the same or a different organism. As used herein with respect to the 
fusion protein, the term "operatively linked" is intended to indicate that the LMP polypeptide 
and the non-LMP polypeptide are fused to each other so that both sequences fulfill the 
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proposed function attributed to the sequence used The non-LMP polypeptide can be fused to 
the N-tenninus or C-terminus of the LMP polypeptide. For example, in one embodiment, the 
fusion protein is a GST-LMP (glutathione S-transferase) fusion protein in which the LMP 
sequences are fused to the C-terminus of the GST sequences. Such fusion proteins can 
facilitate the purification of recombinant LMPs. In another embodiment, the fusion protein is 
a LMP containing a heterologous signal sequence at its N-terminus. In certain host cells 
(e.g., mammalian host cells), expression and/or secretion of a LMP can be increased through 
use of a heterologous signal sequence. 

[054] Preferably, a LMP chimeric or fusion protein of the invention is produced by 

standard recombinant DNA techniques. For example, DNA fragments coding for the 
different polypeptide sequences are ligated together in-frame in accordance with conventional 
techniques, for example by employing blunt-ended or stagger-ended termini for ligation, 
restriction enzyme digestion to provide for appropriate termini, filling-in of cohesive ends as 
appropriate, alkaline phosphatase treatment to avoid undesirable joining, and enzymatic 
ligation. In another embodiment, the fusion gene can be synthesized by conventional 
techniques including automated DNA synthesizers. Alternatively, PCR amplification of gene 
fragments can be carried out using anchor primers which give rise to complementary 
overhangs between two consecutive gene fragments which can subsequently be annealed and 
reamplified to generate a chimeric gene sequence (See, for example, Current Protocols in 
Molecular Biology, eds. Ausubel et al., John Wiley & Sons: 1992). Moreover, many 
expression vectors are commercially available that already encode a fusion moiety (e.g., a 
GST polypeptide). An LMP-encoding nucleic acid can be cloned into such an expression 
vector such that the fusion moiety is linked in-frame to the LMP. 

[055] In addition to the nucleic acid molecules encoding LMPs described above, 

another aspect of the invention pertains to isolated nucleic acid molecules which are antisense 
thereto. An "antisense" nucleic acid comprises a nucleotide sequence which is 
complementary to a "sense" nucleic acid encoding a protein, e.g., complementary to the 
coding strand of a double-stranded cDNA molecule or complementary to an mRNA 
sequence. Accordingly, an antisense nucleic acid can hydrogen bond to a sense nucleic acid. 
The antisense nucleic acid can be complementary to an entire LMP coding strand, or to only 
a portion thereof. In one embodiment, an antisense nucleic acid molecule is antisense to a 
"coding region" of the coding strand of a nucleotide sequence encoding a LMP. The term 
"coding region" refers to the region of the nucleotide sequence comprising codons which are 
translated into amino acid residues (e.g., the entire coding region of Pkl21 comprises 
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nucleotides 1 to 786). In another embodiment, the antisense nucleic acid molecule is 
antisense to a "noncoding region" of the coding strand of a nucleotide sequence encoding 
LMP. The term "noncoding region" refers to 5' and 3' sequences which flank the coding 
region that are not translated into amino acids (i.e., also referred to as 5' and 3' untranslated 
regions). 

[056] Given the coding strand sequences encoding LMP disclosed herein (e.g., the 

polynucleotide sequences set forth in Appendix A), antisense nucleic acids of the invention 
can be designed according to the rules of Watson and Crick base pairing. The antisense 
nucleic acid molecule can be complementary to the entire coding region of LMP mRNA, but 
more preferably is an oligonucleotide which is antisense to only a portion of the coding or 
noncoding region of LMP mRNA For example, the antisense oligonucleotide can be 
complementary to the region surrounding the translation start site of LMP mRNA. An 
antisense oligonucleotide can be, for example, about 5, 10, 15, 20, 25, 30, 35, 40, 45, or 50 
nucleotides in length. An antisense or sense nucleic acid of the invention can be constructed 
using chemical synthesis and enzymatic ligation reactions using procedures known in the art. 
For example, an antisense nucleic acid (e.g., an antisense oligonucleotide) can be chemically 
synthesized using naturally occurring nucleotides or variously modified nucleotides designed 
to increase the biological stability of the molecules or to increase the physical stability of the 
duplex formed between the antisense and sense nucleic acids, e.g., phosphorothioate 
derivatives and acridine substituted nucleotides can be used. Examples of modified 
nucleotides which can be used to generate the antisense nucleic acid include 5-fluorouracil, 5- 
bromouracil, 5-chlorouracil, 5-iodouracil, hypoxanthine, xanthine, 4-acetylcytosine, 5- 
(carboxyhydroxylmethyl) uracil, 5^arboxymemylammo-methyl-2-thiouridine, 5- 
carboxymemylaminomethyluracil, dihydro-uracil, beta-D-galactosylqueosine, inosine, N-6- 
isopentenyladenine, 1-methylguanine, 1-methylinosine, 2,2-dimethylguanine, 2- 
methyladenine, 2-methylguanine, 3-methylcytosine, 5-methyl-cytosine, N-6-adenine, 7- 
methylguanine, 5-methylaminomethyluracU, 5-memoxyamino-methyl-2-thiouracil, beta-D- 
mannosylqueosine, S'-methoxycarboxymethyl-uracil, 5-methoxyuracil, 2-memyltbio-N-6- 
isopentenyladenine, uracil-5-oxyacetic acid (v), wybutoxosine, pseudouracil, queosine, 2- 
thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil, uracil-5- 
oxyacetic acid methylester, uracil-5-oxyacetic acid (v), 5-methyl-2-thiouracil, 3-(3-amino-3- 
N-2-carboxypropyl) uracil, (acp3)w, and 2,6-diamino-purine. Alternatively, the antisense 
nucleic acid can be produced biologically using an expression vector into which a nucleic 
acid has been subcloned in an antisense orientation (i.e., RNA transcribed from the inserted 
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nucleic acid will be of an antisense orientation to a target nucleic acid of interest, described 
further in the following subsection). 

[057] In another variation of the antisense technology, a double-strand interfering RNA 
construct can be used to cause a down-regulation of the LMP mRNA level and IMP activity 
in transgenic plants. This requires transforming the plants with a chimeric construct 
containing a portion of the LMP sequence in the sense orientation fused to the antisense 
sequence of the same portion of the LMP sequence. A DNA linker region of variable length 
can be used to separate the sense and antisense fragments of LMP sequences in the construct. 
[058] The antisense nucleic acid molecules of the invention are typically 

administered to a cell or generated in situ such that they hybridize with or bind to cellular 
mRNA and/or genomic DNA encoding a LMP to thereby inhibit expression of the protein, 
e.g., by inhibiting transcription and/or translation. The hybridization can be by conventional 
nucleotide complementarity to form a stable duplex, or, for example, in the case of an 
antisense nucleic acid molecule which binds to DNA duplexes, through specific interactions 
in the major groove of the double helix. The antisense molecule can be modified such that it 
specifically binds to a receptor or an antigen expressed on a selected cell surface, e.g., by 
linking the antisense nucleic acid molecule to a peptide or an antibody which binds to a cell 
surface receptor or antigen. The antisense nucleic acid molecule can also be delivered to 
cells using the vectors described herein. To achieve sufficient intracellular concentrations of 
the antisense molecules, vector constructs in which the antisense nucleic acid molecule is 
placed under the control of a strong prokaryotic, viral, or eukaryotic including plant 
promoters are preferred. 

[059] In yet another embodiment, the antisense nucleic acid molecule of the 

invention is an anomeric nucleic acid molecule. An anomeric nucleic acid molecule forms 
specific double-stranded hybrids with complementary RNA in which, contrary to the usual 
units, the strands run parallel to each other (Gaultier et al., 1987, Nucleic Acids Res. 15:6625- 
6641). The antisense nucleic acid molecule can also comprise a 2 , -o-methyl-ribonucleotide 
(Inoue et al., 1987, Nucleic Acids Res. 15:6131-6148) or a chimeric RNA-DNA analogue 
(Inoue etal., 1987, FEBS Lett 215:327-330). 

[060] In still another embodiment, an antisense nucleic acid of the invention is a 

ribozyme. Ribozymes are catalytic RNA molecules with ribonuclease activity which are 
capable of cleaving a single-stranded nucleic acid, such as an mRNA to which they have a 
complementary region. Thus, ribozymes (e.g., hammerhead ribozymes (described in 
Haselhoff & Gerlach, 1988, Nature 334:585-591)) can be used to catalytically cleave LMP 
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mRNA transcripts to thereby inhibit translation of IMP mRNA. A ribozyme having 
specificity for an LMP-encoding nucleic acid can be designed based upon the nucleotide 
sequence of an LMP cDNA disclosed herein (e.g., Pkl23 in Appendix A) or on the basis of a 
heterologous sequence to be isolated according to methods taught in mis invention. For 
example, a derivative of a Tetrahymena L-19 WS RNA can be constructed in which the 
nucleotide sequence of the active site is complementary to the nucleotide sequence to be 
cleaved in a LMP-encoding mRNA (See, e.g., U.S. Patent Nos. 4,987,071 and 5,116,742 to 
Cech et aL). Alternatively, LMP mRNA can be used to select a catalytic RNA having a 
specific ribonuclease activity from a pool of RNA molecules (See, e.g., Bartel, D. & Szostak 
J.W. 1993, Science 261:1411-1418). 

[061] Alternatively, LMP gene expression can be inhibited by targeting nucleotide 

sequences complementary to the regulatory region of a LMP nucleotide sequence (e.g., a 
LMP promoter and/or enhancers) to form triple helical structures that prevent transcription of 
a LMP gene in target cells (See generally, Helene C, 1991, Anticancer Drug Des. 6:569-84; 
Helene C. et aL, 1992, Ann. N.Y. Acad. Sci. 660:27-36; and Maher, L.J., 1992, Bioassays 
14:807-15). 

[062] Another aspect of the invention pertains to vectors, preferably expression 

vectors, containing a nucleic acid encoding a LMP (or a portion thereof). As used herein, the 
term "vector" refers to a nucleic acid molecule capable of transporting another nucleic acid to 
which it has been linked. One type of vector is a "plasmid", which refers to a circular double 
stranded DNA loop into which additional DNA segments can be ligated. Another type of 
vector is a viral vector, wherein additional DNA segments can be ligated into the viral 
genome. Certain vectors are capable of autonomous replication in a host cell into which they 
are introduced (e.g., bacterial vectors having a bacterial origin of replication and episomal 
mammalian vectors). Other vectors (e.g., non-episomal mammalian vectors) are integrated 
into the genome of a host cell upon introduction into the host cell, and thereby are replicated 
along with the host genome. Moreover, certain vectors are capable of directing the 
expression of genes to which they are operatively linked. Such vectors are referred to herein 
as "expression vectors." In general, expression vectors of utility in recombinant DNA 
techniques are often in the form of plasmids. ha the present specification, "plasmid" and 
"vector" can be used interchangeably as the plasmid is the most commonly used form of 
vector. However, the invention is intended to include such other forms of expression vectors, 
such as viral vectors (e.g., replication defective retroviruses, adenoviruses and adeno- 
associated viruses), which serve equivalent functions. 
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[0631 The recombinant expression vectors of the invention comprise a nucleic acid 

of the invention in a form suitable for expression of 1he nucleic acid in a host cell, which 
means that the recombinant expression vectors include one or more regulatory sequences, 
selected on the basis of the host cells to be used for expression, which is operatively linked to 
the nucleic acid sequence to be expressed. As used herein with respect to a recombinant 
expression vector, "operatively linked" is intended to mean that the nucleotide sequence of 
interest is linked to the regulatory sequence^) in a manner which allows for expression of the 
nucleotide sequence and both sequences are fused to each other so that each fulfills its 
proposed function (e.g., in an in vitro transcription/translation system or in a host cell when 
the vector is introduced into the host cell). The term "regulatory sequence" is intended to 
include promoters, enhancers, and other expression control elements (e.g., polyadenylation 
signals). Such regulatory sequences are described, for example, in Goeddel; Gene Expression 
Technology: Methods in Enzymology 185, Academic Press, San Diego, CA (1990) and 
Gruber and Crosby, in: Methods in Plant Molecular Biology and Biotechnolgy, CRC Press, 
Boca Raton, Florida, eds.: Glick & Thompson, Chapter 7, 89-108 including the references 
therein. Regulatory sequences include those which direct constitutive expression of a 
nucleotide sequence in many types of host cell and those which direct expression of the 
nucleotide sequence only in certain host cells or under certain conditions. It will be 
appreciated by those skilled in the art that the design of the expression vector can depend on 
such factors as the choice of the host cell to be transformed, the level of expression of protein 
desired, etc. The expression vectors of the invention can be introduced into host cells to 
thereby produce proteins or peptides, including fusion proteins or peptides, encoded by 
nucleic acids as described herein (e.g., LMPs, mutant forms of LMPs, fusion proteins, etc.). 
[064] The recombinant expression vectors of the invention can be designed for 

expression of LMPs in prokaryotic or eukaryotic cells. For example, LMP genes can be 
expressed in bacterial cells, insect cells (using baculovirus expression vectors), yeast and 
other fungal cells (See Romanos M.A. et al., 1992, Foreign gene expression in yeast a 
review, Yeast 8:423-488; van den Hondel, C.A.MJ.J. et al., 1991, Heterologous gene 
expression in filamentous fungi, in: More Gene Manipulations in Fungi, Bennet & Lasure, 
eds., p. 396-428:Academic Press: an Diego; and van den Hondel & Punt, 1991, Gene transfer 
systems and vector development for filamentous fungi, in: Applied Molecular Genetics of 
Fungi, Peberdy et al., eds., p. 1-28, Cambridge University Press: Cambridge), algae 
(Falciatore et al., 1999, Marine Biotechnology 1:239-251), ciliates of the types: Holotrichia, 
Peritrichia, Spirotrichia, Suctoria, Tetrahymena, Paramecium, Colpidium, Glaucoma, 
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Platyophrya, Potomacus, Pseudocohnilembus, Euplotes, Engelmaniella, and Stylonychia, 
especially of the genus Stylonychia lemnae with vectors following a transformation method 
as described in WO 98/01572, and multicellular plant cells (See Schmidt & Willmitzer, 1988, 
High efficiency Agrobacterium tumefacietis-me&ated transformation of Arabidopsis thaliana 
leaf and cotyledon plants, Plant Cell Rep.:583-586; Plant Molecular Biology and 
Biotechnology, C Press, Boca Raton, Florida, chapter 6/7, S.71-119 (1993); White, Jenes et 
al., Techniques for Gene Transfer, in: Transgenic Plants, Vol. 1, Engineering and Utilization, 
eds.: Kung and Wu, Academic Press 1993, 128-43; Potrykus, 1991, Annu. Rev. Plant 
Physiol. Plant Mol. Biol. 42:205-225 (and references cited therein)), or mammalian cells. 
Suitable host cells are discussed further in Goeddel, Gene Expression Technology: Methods 
in Enzymology 185, Academic Press, San Diego, CA 1990). Alternatively, the recombinant 
expression vector can be transcribed and translated in vitro, for example using T7 promoter 
regulatory sequences and T7 polymerase. 

[065] Expression of proteins in prokaryotes is most often carried out with vectors 

containing constitutive or inducible promoters directing the expression of either fusion or 
non-fusion proteins. Fusion vectors add a number of amino acids to a protein encoded 
therein, usually to the amino terminus of the recombinant protein but also to the C-terminus 
or fused within suitable regions in the proteins. Such fusion vectors typically serve one or 
more of the following purposes: 1) to increase expression of recombinant protein; 2) to 
increase the solubility of the recombinant protein; and 3) to aid in the purification of the 
recombinant protein by acting as a ligand in affinity purification. Often, in fusion expression 
vectors, a proteolytic cleavage site is introduced at the junction of the fusion moiety and the 
recombinant protein to enable separation of the recombinant protein from the fusion moiety 
subsequent to purification of the fusion protein. Such enzymes, and their cognate recognition 
sequences, include Factor Xa, thrombin and enterokinase. 

[066] Typical fusion expression vectors include pGEX (Pharmacia Biotech Inc; 

Smith & Johnson, 1988, Gene 67:31-40), pMAL (New England Biolabs, Beverly, MA), and 
pRTT5 (Pharmacia, Piscataway, NJ) which fuse glutathione S-transferase (GST), maltose E 
binding protein, or protein A, respectively, to the target recombinant protein, hi one 
embodiment, the coding sequence of the LMP is cloned into a pGEX expression vector to 
create a vector encoding a fusion protein comprising, from the N-terminus to the C-terminus, 
GST-thrombin cleavage site-X protein. The fusion protein can be purified by affinity 
chromatography using glutathione-agarose resin. Recombinant LMP unfused to GST can be 
recovered by cleavage of the fusion protein with thrombin. 
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[067] Examples of suitable inducible non-fusion E. coli expression vectors include 

pTrc (Amann et al., 1988, Gene 69:301-315) and pET lid (Studier et al., 1990, Gene 
Expression Technology .Methods in Enzymology 185, Academic Press, San Diego, California 
60-89). Target gene expression from the pTrc vector relies on host RNA polymerase 
transcription from a hybrid trp-lac fusion promoter. Target gene expression from the pET 
lid vector relies on transcription from a T7 gnlO-lac fusion promoter mediated by a 
coexpressed viral RNA polymerase (T7 gnl). This viral polymerase is supplied by host 
strains BL21(DE3) or HMS174(DE3) from a resident prophage harboring a T7 gnl gene 
under the transcriptional control of the lacUV 5 promoter. 

[068] One strategy to maximize recombinant protein expression is to express the 

protein in a host bacteria with an impaired capacity to proteolytically cleave the recombinant 
protein (Gottesman S., 1990, Gene Expression Technology:Methods in Enzymology 
185:119-128, Academic Press, San Diego, California). Another strategy is to alter the 
nucleic acid sequence of the nucleic acid to be inserted into an expression vector so that the 
individual codons for each amino acid are those preferentially utilized in the bacterium 
chosen for expression (Wada et al., 1992, Nucleic Acids Res. 20:211 1-21 18). Such alteration 
of nucleic acid sequences of the invention can be carried out by standard DNA synthesis 
techniques. 

[069] In another embodiment, the LMP expression vector is a yeast expression 

vector. Examples of vectors for expression in yeast S. cerevisiae include pYepSecl (Baldari 
et al., 1987, Embo J. 6:229-234), pMFa (Kurjan & Herskowitz, 1982, Cell 30:933-943), 
pJRY88 (Schultz et al., 1987, Gene 54:113-123), and pYES2 (Invitrogen Corporation, San 
Diego, CA). Vectors and methods for the construction of vectors appropriate for use in other 
fungi, such as the filamentous fungi, include those detailed in: van den Hondel & Punt, 1991, 
"Gene transfer systems and vector development for filamentous fungi, in: Applied Molecular 
Genetics of Fungi, Peberdy et aL. eds., p. 1-28, Cambridge University Press: Cambridge. 
[070] Alternatively, the LMPs of the invention can be expressed in insect cells using 

baculovirus expression vectors. Baculovirus vectors available for expression of proteins in 
cultured insect cells (e.g., Sf 9 cells) include the pAc series (Smith et al., 1983, Mol. Cell 
Biol. 3:2156-2165) and the pVL series (Lucklow & Summers, 1989, Virology 170:31-39). 
[071] In yet another embodiment, a nucleic acid of the invention is expressed in 

mammalian cells using a mammalian expression vector. Examples of mammalian expression 
vectors include pCDM8 (Seed, 1987, Nature 329:840) and pMT2PC (Kaufman et al., 1987, 
EMBO J. 6:187-195). When used in mammalian cells, the expression vector's control 
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functions are often provided by viral regulatory elements. For example, commonly used 
promoters are derived from polyoma, Adenovirus 2, cytomegalovirus, and Simian Virus 40. 
For other suitable expression systems for both prokaryotic and eukaryotic cells, see chapters 
16 and 17 of Sambrook, Fritsh and Maniatis, Molecular Cloning: A Laboratory Manual. 2nd, 
ed., Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring 
Harbor, NY, 1989. 

[072] In another embodiment, the LMPs of the invention may be expressed in uni- 

cellular plant cells (such as algae, see Falciatore et al. (1999, Marine Biotechnology 1:239- 
251 and references therein) and plant cells from higher plants (e.g., the spermatophytes, such 
as crop plants). Examples of plant expression vectors include those detailed in: Becker, 
Kemper, Schell and Masterson (1992, "New plant binary vectors with selectable markers 
located proximal to the left border", Plant Mol. Biol. 20:1195-1197) and Bevan (1984, 
"Binary Agrobacterium vectors for plant transformation, Nucleic Acids Res. 12:8711-8721; 
Vectors for Gene Transfer in Higher Plants; in: Transgenic Plants, Vol. 1, Engineering and 
Utilization, eds.: Kung und R Wu, Academic Press, 1993, S. 15-38). 

[073] A plant expression cassette preferably contains regulatory sequences capable to drive 
gene expression in plant cells and which are operatively linked so that each sequence can 
fulfil its function such as termination of transcription, including polyadenylation signals. 
Preferred polyadenylation signals are those originating from Agrobacterium tumefaciens t- 
DNA such as the gene 3 known as octopine synthase of the Ti-plasmid pTiACH5 (Gielen et 
al. 1984, EMBO J. 3:835) or functional equivalents thereof but also all other terminators 
functionally active in plants are suitable. 

[074] As plant gene expression is very often not limited on transcriptional levels a plant 
expression cassette preferably contains other operatively linked sequences like translational 
enhancers such as the overdrive-sequence containing the 5 '-untranslated leader sequence 
from tobacco mosaic virus enhancing the protein per RNA ratio (Gallie et al. 1987, Nucleic 
Acids Res. 15:8693-8711). 

[075] Plant gene expression has to be operatively linked to an appropriate promoter 
conferring gene expression in a timely, cell or tissue specific manner. Preferred are promoters 
driving constitutive expression (Benfey et al. 1989, EMBO J. 8:2195-2202) like those derived 
from plant viruses like the 35S CAMV (Franck et al. 1980, Cell 21:285-294), the 19S CaMV 
(see also US 5,352,605 and WO 84/02913) or plant promoters like those from Rubisco small 
subunit described in US 4,962,028. Even more preferred are seed-specific promoters driving 
expression of LMP proteins during all or selected stages of seed development Seed-specific 
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plant promoters are known to those of ordinary skill in the art and are identified and 
characterized using seed-specific mRNA libraries and expression profiling techniques. Seed- 
specific promoters include the napin-gene promoter from rapeseed (US 5,608,152), the USP- 
promoter from Vicia faba (Baeumlein et al. 1991, Mol. Gen. Genetics 225:459-67), the 
oleosin-promoter from Arabidopsis (WO 98/45461), the phaseolin-promoter from Ptiaseolus 
vulgaris (US 5,504,200), the Bce4-promoter from Brassica (W091 13980) or the legumin B4 
promoter (LeB4; Baeumlein et al. 1992, Plant J. 2:233-239) as well as promoters conferring 
seed specific expression in monocot plants like maize, barley, wheat, rye, rice etc. Suitable 
promoters to note are the lpt2 or lptl-gene promoter from barley (WO 95/15389 and WO 
95/23230) or those described in WO 99/16890 (promoters from the barley hordein-gene, the 
rice glutelin gene, the rice oryzin gene, the rice prolamin gene, the wheat gliadin gene, wheat 
glutelin gene, the maize zein gene, the oat glutelin gene, the Sorghum kasirin-gene, and the 
rye secalin gene). 

1076] Plant gene expression can also be facilitated via an inducible promoter (for review see 
Gate 1997, Annu. Rev. Plant Physiol. Plant Mol. Biol. 48:89-108). Chemically inducible 
promoters are especially suitable if gene expression is desired in a time specific manner. 
Examples for such promoters are a salicylic acid inducible promoter (WO 95/19443), a 
tetracycline inducible promoter (Gatz et al. 1992, Plant J. 2:397-404) and an ethanol 
inducible promoter (WO 93/21334). 

[0771 Promoters responding to biotic or abiotic stress conditions are also suitable promoters 
such as the pathogen inducible PRPl-gene promoter (Ward et al., 1993, Plant. Mol. Biol. 
22:361-366), the heat inducible hsp80-promoter from tomato (US 5,187,267), cold inducible 
alpha-amylase promoter from potato (WO 96/12814) or the wound-inducible pinll-promoter 
(EP 375091). 

[078] Other preferred sequences for use in plant gene expression cassettes are targeting- 
sequences necessary to direct the gene-product in its appropriate cell compartment (for 
review see Kermode 1996, Crit. Rev. Plant Sci. 15:285-423 and references cited therein) such 
as the vacuole, the nucleus, all types of plastids like amyloplasts, chloroplasts, chromoplasts, 
the extracellular space, mitochondria, the endoplasmic reticulum, oil bodies, peroxisomes and 
other compartments of plant cells. Also especially suited are promoters that confer plastid- 
specific gene expression, as plastids are the compartment where precursors and some end 
products of lipid biosynthesis are synthesized. Suitable promoters such as the viral RNA- 
polymerase promoter are described in WO 95/16783 and WO 97/06250 and the clpP- 
promoter from Arabidopsis described in WO 99/46394. 
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[079] The invention further provides a recombinant expression vector comprising a 

DNA molecule of the invention cloned into the expression vector in an antisense orientation. 
That is, the DNA molecule is operatively linked to a regulatory sequence in a manner which 
allows for expression (by transcription of the DNA molecule) of an RNA molecule which is 
antisense to LMP mRNA. Regulatory sequences operatively linked to a nucleic acid cloned 
in the antisense orientation can be chosen which direct the continuous expression of the 
antisense RNA molecule in a variety of cell types, for instance viral promoters and/or 
enhancers, or regulatory sequences can be chosen which direct constitutive, tissue specific or 
, cell type specific expression of antisense RNA. The antisense expression vector can be in the 
form of a recombinant plasmid, phagemid or attenuated virus in which antisense nucleic acids 
are produced under the control of a high efficiency regulatory region, the activity of which 
can be determined by the cell type into which the vector is introduced. For a discussion of 
the regulation of gene expression using antisense genes see Weintraub et al. (1986, Antisense 
RNA as a molecular tool for genetic analysis, Reviews - Trends in Genetics, Vol. 1) and Mol 
et al. (1990, FEBS Lett. 268:427-430). 

[080] Another aspect of the invention pertains to host cells into which a recombinant 

expression vector of the invention has been introduced. The terms "host cell" and 
"recombinant host cell" are used interchangeably herein. It is to be understood that such 
terms refer not only to the particular subject cell but also to the progeny or potential progeny 
of such a cell. Because certain modifications may occur in succeeding generations due to 
either mutation or environmental influences, such progeny may not, in fact, be identical to the 
parent cell, but are still included within the scope of the term as used herein. A host cell can 
be any prokaryotic or eukaryotic cell. For example, a LMP can be expressed in bacterial 
cells, insect cells, fungal cells, mammalian cells (such as Chinese hamster ovary cells (CHO) 
or COS cells), algae, ciliates or plant cells. Other suitable host cells are known to those 
skilled in the art. 

[081] Vector DNA can be introduced into prokaryotic or eukaryotic cells via 

conventional transformation or transfection techniques. As used herein, the terms 
"transformation" and "transfection", "conjugation" and "transduction" are intended to refer to 
a variety of art-recognized techniques for introducing foreign nucleic acid (e.g., DNA) into a 
host cell, including calcium phosphate or calcium chloride co-precipitation, DEAE-dextran- 
mediated transfection, lipofection, natural competence, chemical-mediated transfer, or 
electroporation. Suitable methods for tirasforming or transfecting host cells including plant 
cells can be found in Sambrook et al. (1989, Molecular Cloning: A Laboratory Manual. 2nd, 
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ed, Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring 
Harbor, NY) and other laboratory manuals such as Methods in Molecular Biology 1995, Vol. 
44, Agrobacterium protocols, ed: Gartland and Davey, Humana Press, Totowa, New Jersey. 
[082] For stable transfection of mammalian and plant cells, it is known mat, 

depending upon the expression vector and transfection technique used, only a small fraction 
of cells may integrate the foreign DNA into their genome. In order to identify and select 
these integrants, a gene that encodes a selectable marker (e.g., resistance to antibiotics) is 
generally introduced into the host cells along with the gene of interest. Preferred selectable 
markers include those which confer resistance to drugs, such as G418, hygromycin, 
kanamycin and methotrexate or in plants that confer resistance towards an herbicide such as 
glyphosate or glufosinate. A nucleic acid encoding a selectable marker can be introduced 
into a host cell on the same vector as that encoding a LMP or can be introduced on a separate 
vector. Cells stably transfected with the introduced nucleic acid can be identified by, for 
example, drug selection (e.g., cells that have incorporated the selectable marker gene will 
survive, while the other cells die). 

[083] To create a homologous recombinant microorganism, a vector is prepared 

which contains at least a portion of a LMP gene into which a deletion, addition or substitution 
has been introduced to thereby alter, e.g., functionally disrupt, the LMP gene. Preferably, 
this LMP gene is an Arabidopsis thaliana, Brassica napus, and Physcomitrella patens LMP 
gene, but it can be a homologue from a related plant or even from a mammalian, yeast, or 
insect source. In a preferred embodiment, the vector is designed such that, upon homologous 
recombination, the endogenous LMP gene is functionally disrupted (i.e., no longer encodes a 
functional protein; also referred to as a knock-out vector). Alternatively, the vector can be 
designed such that, upon homologous recombination, the endogenous LMP gene is mutated 
or otherwise altered but still encodes functional protein (e.g., the upstream regulatory region 
can be altered to thereby alter the expression of the endogenous LMP). To create a point 
mutation via homologous recombination, DNA-RNA hybrids can be used in a technique 
known as chimeraplasty (Cole-Strauss et al. 1999, Nucleic Acids Res. 27:1323-1330 and 
Kmiec 1999, American Scientist 87:240-247). Homologous recombination procedures in 
Arabidopsis thaliana are also well known in the art and are contemplated for use herein. 
[084] In a homologous recombination vector, the altered portion of the LMP gene is flanked 
at its 5' and 3' ends by additional nucleic acid of the LMP gene to allow for homologous 
recombination to occur between the exogenous LMP gene carried by the vector and an 
endogenous LMP gene in a microorganism or plant. The additional flanking LMP nucleic 
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acid is of sufficient length for successful homologous recombination with the endogenous 
gene. Typically, several hundreds of base pairs up to kilobases of flanking DNA (bom at the 
5' and 3' ends) are included in the vector (see e.g., Thomas & Capecchi 1987, Cell 51:503, 
for a description of homologous recombination vectors). The vector is introduced into a 
microorganism or plant cell (e.g., via polyethyleneglycol mediated DNA). Cells in which the 
introduced IMP gene has homologously recombined with the endogenous IMP gene are 
selected using art-known techniques. 

[085] In another embodiment, recombinant microorganisms can be produced which contain 
selected systems which allow for regulated expression of the introduced gene. For example, 
inclusion of a IMP gene on a vector placing it under control of the lac operon permits 
expression of the IMP gene only in the presence of IPTG. Such regulatory systems are well 
known in the art. 

[086] A host cell of the invention, such as a prokaryotic or eukaryotic host cell in culture 
can be used to produce (i.e., express) a IMP. Accordingly, the invention further provides 
methods for producing LMPs using the host cells of the invention. In one embodiment, the 
method comprises culturing a host cell of the invention (into which a recombinant expression 
vector encoding a IMP has been introduced, or which contains a wild-type or altered IMP 
gene in it's genome) in a suitable medium until IMP is produced. In another embodiment, 
the method further comprises isolating IMPs from the medium or the host cell. 
[087] Another aspect of the invention pertains to isolated LMPs, and biologically 

active portions thereof. An "isolated" or "purified" protein or biologically active portion 
thereof is substantially free of cellular material when produced by recombinant DNA 
techniques, or chemical precursors or other chemicals when chemically synthesized. The 
language "substantially free of cellular material" includes preparations of LMP in which the 
protein is separated from cellular components of the cells in which it is naturally or 
recombinant^ produced. In one embodiment, the language "substantially free of cellular 
material" includes preparations of LMP having less than about 30% (by dry weight) of non- 
LMP (also referred to herein as a "contaminating protein"), more preferably less than about 
20% of non-LMP, still more preferably less than about 10% of non-LMP, and most 
preferably less than about 5% non-LMP. When the IMP or biologically active portion 
thereof is recombinantly produced, it is also preferably substantially free of culture medium, 
i.e., culture medium represents less than about 20%, more preferably less than about 10%, 
and most preferably less than about 5% of the volume of the protein preparation. The 
language "substantially free of chemical precursors or other chemicals" includes preparations 
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of LMP in which the protein is separated from chemical precursors or other chemicals which 
are involved in the synthesis of the protein. In one embodiment, the language "substantially 
free of chemical precursors or other chemicals" includes preparations of LMP having less 
than about 30% (by dry weight) of chemical precursors or non-LMP chemicals, more 
preferably less than about 20% chemical precursors or non-LMP chemicals, still more 
preferably less than about 10% chemical precursors or non-LMP chemicals, and most 
preferably less than about 5% chemical precursors or non-LMP chemicals. In preferred 
embodiments, isolated proteins or biologically active portions thereof lack contaminating 
proteins from the same organism from which the LMP is derived. Typically, such proteins 
are produced by recombinant expression of, for example, an Arabidopsis thaliana and 
Brassica napus LMP in other plants than Arabidopsis thaliana and Brassica napus or 
microorganisms, algae or fungi. 

[088] An isolated LMP or a portion thereof of the invention can participate in the 

metabolism of compounds necessary for the production of seed storage compounds in 
Arabidopsis thaliana and Brassica napus, or of cellular membranes, or has one ormore of the 
activities set forth in Table 3. In preferred embodiments, the protein or portion thereof 
comprises an amino acid sequence which is sufficiently homologous to an amino acid 
sequence encoded by a nucleic acid of Appendix A such that the protein or portion thereof 
maintains the ability to participate in the metabolism of compounds necessary for the 
construction of cellular membranes in Arabidopsis thaliana and Brassica napus, or in the 
transport of molecules across these membranes. The portion of the protein is preferably a 
biologically active portion as described herein. In another preferred embodiment, a LMP of 
the invention has an amino acid sequence encoded by a nucleic acid of Appendix A. In yet 
another preferred embodiment, the LMP has an amino acid sequence which is encoded by a 
nucleotide sequence which hybridizes, e.g., hybridizes under stringent conditions, to a 
nucleotide sequence of Appendix A. In still another preferred embodiment, the LMP has an 
amino acid sequence which is encoded by a nucleotide sequence that is at least about 50- 
60%, preferably at least about 60-70%, more preferably at least about 70-80%, 80-90%, 90- 
95%, and even more preferably at least about 96%, 97%, 98%, 99% or more homologous to 
one of the amino acid sequences encoded by a nucleic acid of Appendix A. The preferred 
LMPs of the present invention also preferably possess at least one of the LMP activities 
described herein. For example, a preferred LMP of the present invention includes an amino 
acid sequence encoded by a nucleotide sequence which hybridizes, e.g., hybridizes under 
stringent conditions, to a nucleotide sequence of Appendix A, and which can participate in 
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the metabolism of compounds necessary for the construction of cellular membranes in 
Arabidopsis thaliana and Brassica napus, or in the transport of molecules across these 
membranes, or which has one or more of the activities set forth in Table 3. 
[089] In other embodiments, the LMP is substantially homologous to an amino acid 

sequence encoded by a nucleic acid of Appendix A and retains the functional activity of the 
protein of one of the sequences encoded by a nucleic acid of Appendix A yet differs in amino 
acid sequence due to natural variation or mutagenesis, as described in detail above. 
Accordingly, in another embodiment, the LMP is a protein which comprises an amino acid 
sequence which is at least about 50-60%, preferably at least about 60-70%, and more 
preferably at least about 70-80, 80-90, 90-95%, and most preferably at least about 96%, 97%, 
98%, 99% or more homologous to an entire amino acid sequence and which has at least one 
of the LMP activities described herein. In another embodiment, the invention pertains to a 
full Arabidopsis thaliana and Brassica napus protein which is substantially homologous to an 
entire amino acid sequence encoded by a nucleic acid of Appendix A 

[090] Dominant negative mutations or trans-dominant suppression can be used to 

reduce the activity of a LMP in transgenics seeds in order to change the levels of seed storage 
compounds. To achieve mis a mutation that abolishes the activity of the LMP is created and 
the inactive non-functional LMP gene is overexpressed in the transgenic plant. The inactive 
trans-dominant LMP protein competes with the active endogenous LMP protein for substrate 
or interactions with other proteins and dilutes out the activity of the active LMP. In this way 
the biological activity of the LMP is reduced without actually modifying the expression of the 
endogenous LMP gene. This strategy was used by Pontier et al to modulate the activity of 
plant transcription factors (Pontier D, Miao ZH, Lam E, Plant J 2001 Sep;27(6):529-38, 
Trans-dominant suppression of plant TGA factors reveals their negative and positive roles in 
plant defense responses). 

[091] Homologues of the LMP can be generated by mutagenesis, e.g., discrete point 

mutation or truncation of the LMP. As used herein, the term "homologue" refers to a variant 
form of the LMP which acts as an agonist or antagonist of the activity of the LMP. An 
agonist of the LMP can retain substantially the same, or a subset, of the biological activities 
of the LMP. An antagonist of the LMP can inhibit one or more of the activities of the 
naturally occurring form of the LMP, by, for example, competitively binding to a 
downstream or upstream member of the cell membrane component metabolic cascade which 
includes the LMP, or by binding to a LMP which mediates transport of compounds across 
such membranes, thereby preventing translocation from taking place. 
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[092] In an alternative embodiment, homologues of the LMP can be identified by 

screening combinatorial libraries of mutants, e.g., truncation mutants, of the LMP for LMP 
agonist or antagonist activity. In one embodiment, a variegated library of LMP variants is 
generated by combinatorial mutagenesis at the nucleic acid level and is encoded by a 
variegated gene library. A variegated library of LMP variants can be produced by, for 
example, enzymatically ligating a mixture of synthetic oligonucleotides into gene sequences 
such that a degenerate set of potential LMP sequences is expressible as individual 
polypeptides, or alternatively, as a set of larger fusion proteins (e.g., for phage display) 
containing the set of LMP sequences therein. There are a variety of methods which can be 
used to produce libraries of potential LMP homologues from a degenerate oligonucleotide 
sequence. Chemical synthesis of a degenerate gene sequence can be performed in an 
automatic DNA synthesizer, and the synthetic gene then ligated into an appropriate 
expression vector. Use of a degenerate set of genes allows for the provision, in one mixture, 
of all of the sequences encoding the desired set of potential LMP sequences. Methods for 
synthesizing degenerate oligonucleotides are known in the art (see, e.g., Narang 1983, 
Tetrahedron 39:3; Itakura et al. 1984, Annu. Rev. Biochem. 53:323; Itakura et al. 1984, 
Science 198:1056; Ike et al. 1983, Nucleic Acids Res. 11:477). 

[093] In addition, libraries of fragments of the LMP coding sequences can be used to 

generate a variegated population of LMP fragments for screening and subsequent selection of 
homologues of a LMP. In one embodiment, a library of coding sequence fragments can be 
generated by treating a double stranded PCR fragment of a LMP coding sequence with a 
nuclease under conditions wherein nicking occurs only about once per molecule, denaturing 
the double stranded DNA, renaturing the DNA to form double stranded DNA which can 
include sense/antisense pairs from different nicked products, removing single stranded 
portions from reformed duplexes by treatment with SI nuclease, and ligating the resulting 
fragment library into an expression vector. By this method, an expression library can be 
derived which encodes N-terminal, C-terminal and internal fragments of various sizes of the 
LMP. 

[094] Several techniques are known in the art for screening gene products of 

combinatorial libraries made by point mutations or truncation, and for screening cDNA 
libraries for gene products having a selected property. Such techniques are adaptable for 
rapid screening of the gene libraries generated by the combinatorial mutagenesis of LMP 
homologues. The most widely used techniques, which are amenable to high through-put 
analysis, for screening large gene libraries typically include cloning the gene library into 
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replicable expression vectors, transforming appropriate cells with Hie resulting library of 
vectors, and expressing the combinatorial genes under conditions in which detection of a 
desired activity facilitates isolation of the vector encoding the gene whose product was 
detected. Recursive ensemble mutagenesis (REM), a new technique which enhances the 
frequency of functional mutants in the libraries, can be used in combination with the 
screening assays to identify LMP homologies (Arkin & Yourvan 1992, Proc. Natl. Acad. Sci. 
USA 89:781 1-7815; Delgrave et al. 1993, Protein Engineering 6:327-331). 
[095] In another embodiment, cell based assays can be exploited to analyze a 

variegated LMP library, using methods well known in the art. 

[096] The nucleic acid molecules, proteins, protein homologues, fusion proteins, 

primers, vectors, and host cells described herein can be used in one or more of the following 
methods: identification of Arabidopsis thaliana an&Brassica napus and related organisms; 
mapping of genomes of organisms related to Arabidopsis thaliana and Brassica napus; 
identification and localization of Arabidopsis tlialiana and Brassica napus sequences of 
interest; evolutionary studies; determination of LMP regions required for function; 
modulation of a LMP activity; modulation of the metabolism of one or more cell functions; 
modulation of the transmembrane transport of one or more compounds; and modulation of 
seed storage compound accumulation. 

[097] The plant Arabidopsis thaliana represents one member of higher (or seed) plants. It is 
related to other plants such as Brassica napus or soybean which require light to drive 
photosynthesis and growth. Plants like Arabidopsis thaliana and Brassica napus share a high 
degree of homology on the DNA sequence and polypeptide level, allowing the use of 
heterologous screening of DNA molecules with probes evolving from other plants or 
organisms, thus enabling the derivation of a consensus sequence suitable for heterologous 
screening or functional annotation and prediction of gene functions in third species. The 
ability to identify such functions can therefore have significant relevance, e.g., prediction of 
substrate specificity of enzymes. Further, these nucleic acid molecules may serve as 
reference points for the mapping of Arabidopsis genomes, or of genomes of related 
organisms. 

[098] The LMP nucleic acid molecules of the invention have a variety of uses. First, 

they may be used to identify an organism as being Arabidopsis thaliana, Brassica napus, and 
Physcomitrella patens or a close relative thereof. Also, they may be used to identify the 
presence of Arabidopsis thaliana, Brassica napus, and Physcomitrella patens or a relative 
thereof in a mixed population of microorganisms. The invention provides the nucleic acid 
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sequences of a number of Arabidopsis thaliana and Brassica napus genes; by probing Ihe 
extracted genomic DNA of a culture of a unique or mixed population of microorganisms 
under stringent conditions with a probe spanning a region of an Arabidopsis thaliana and 
Brassica napus gene which is unique to this organism, one can ascertain whether this 
organism is present. 

[099] Further, the nucleic acid and protein molecules of the invention may serve as markers 
for specific regions of the genome. This has utility not only in the mapping of the genome, 
but also for functional studies of Arabidopsis thaliana and Brassica napus proteins. For 
example, to identify the region of the genome to which a particular Arabidopsis thaliana and 
Brassica napus DNA-binding protein binds, the Arabidopsis thaliana and Brassica napus 
genome could be digested, and the fragments incubated with the DNA-binding protein. Those 
which bind the protein may be additionally probed with the nucleic acid molecules of the 
invention, preferably with readily detectable labels; binding of such a nucleic acid molecule 
to the genome fragment enables the localization of the fragment to the genome map of 
Arabidopsis thaliana and Brassica napus, and, when performed multiple times with different 
enzymes, facilitates a rapid determination of the nucleic acid sequence to which the protein 
binds. Further, the nucleic acid molecules of the invention may be sufficiently homologous to 
the sequences of related species such that these nucleic acid molecules may serve as markers 
for the construction of a genomic map in related plants. 

[0100] The IMP nucleic acid molecules of the invention are also useful for 

evolutionary and protein structural studies. The metabolic and transport processes in which 
the molecules of the invention participate are utilized by a wide variety of prokaryotic and 
eukaryotic cells; by comparing the sequences of the nucleic acid molecules of the present 
invention to those encoding similar enzymes from other organisms, the evolutionary 
relatedness of the organisms can be assessed. Similarly, such a comparison permits an 
assessment of which regions of the sequence are conserved and which are not, which may aid 
in detemiining those regions of the protein which are essential for the functioning of the 
enzyme. This type of determination is of value for protein engineering studies and may give 
an indication of what the protein can tolerate in terms of mutagenesis without losing function. 
[0101] Manipulation of the LMP nucleic acid molecules of the invention may result in 

the production of LMPs having functional differences from the wild-type LMPs. These 
proteins may be improved in efficiency or activity, may be present in greater numbers in the 
cell than is usual, or may be decreased in efficiency or activity. 
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[0102] There are a number of mechanisms by which the alteration of a IMP of the invention 
may directly affect the accumulation of seed storage compounds. In the case of plants 
expressing LMPs, increased transport can lead to altered accumulation of compounds and/or 
solute partitioning within the plant tissue and organs which ultimately could be used to affect 
the accumulation of one or more seed storage compounds during seed development. An 
example is provided by Mitsukawa et al. (1997, Proc. Natl. Acad. Sci. USA 94:7098-7102), 
where over expression of an Arabidopsis high-affinity phosphate transporter gene in tobacco 
cultured cells enhanced cell growth under phosphate-limited conditions. Phosphate 
availability also affects significantly the production of sugars and metabolic intermediates 
(Hurry et al. 2000, Plant J. 24:383-396) and the lipid composition in leaves and roots (Hartel 
et al. 2000, Proc. Natl. Acad. Sci. USA 97:10649-10654). Likewise, the activity of the plant 
ACCase has been demonstrated to be regulated by phosphorylation (Savage & Ohlrogge 
1999, Plant J. 18:521-527) and alterations in the activity of the kinases and phosphatases 
(LMPs) that act on the ACCase could lead to increased or decreased levels of seed lipid 
accumulation. Moreover, the presence of lipid kinase activities in chloroplast envelope 
membranes suggests that signal transduction pathways and/or membrane protein regulation 
occur in envelopes (see, e.g., Muller et al. 2000, J. Biol. Chem. 275:19475-19481 and 
literature cited therein). The ABI1 and ABI2 genes encode two protein serine/threonine 
phosphatases 2C, which are regulators in abscisic acid signaling pathway, and thereby in 
early and late seed development (e.g. Merlot et al. 2001, Plant J. 25:295-303). For more 
examples see also the section 'background of the invention'. 

[0103] The present invention also provides antibodies which specifically binds to an LMP- 
polypeptide, or a portion thereof, as encoded by a nucleic acid disclosed herein or as 
described herein. 

[0104] Antibodies can be made by many well-known methods (see, e.g. Harlow and Lane, 
"Antibodies; A Laboratory Manual" Cold Spring Harbor Laboratory, Cold Spring Harbor, 
New York, 1988). Briefly, purified antigen can be injected into an animal in an amount and 
in intervals sufficient to elicit an immune response. Antibodies can either be purified 
directly, or spleen cells can be obtained from the animal. The cells can then fused with an 
immortal cell line and screened for antibody secretion. The antibodies can be used to screen 
nucleic acid clone libraries for cells secreting the antigen. Those positive clones can then be 
sequenced (see, for example, Kelly et al. 1992, Bio/Technology 10:163-167; Bebbington et 
al. 1992, Bio/Technology 10:169-175). 
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[0105] The phrase "selectively binds" with the polypeptide refers to a binding reaction which 
is determinative of the presence of the protein in a heterogeneous population of proteins and 
other biologies. Thus, under designated immunoassay conditions, the specified antibodies 
bound to a particular protein do not bind in a significant amount to other proteins present in 
the sample. Selective binding to an antibody under such conditions may require an antibody 
mat is selected for its specificity for a particular protein. A variety of immunoassay formats 
may be used to select antibodies that selectively bind with a particular protein. For example, 
solid-phase ELISA immunoassays are routinely used to select antibodies selectively 
immunoreactive with a protein. See Harlow and Lane "Antibodies, A Laboratory Manual" 
Cold Spring Harbor Publications, New York (1988), for a description of immunoassay 
formats and conditions that could be used to determine selective binding. 
[01061 In some instances, it is desirable to prepare monoclonal antibodies from various hosts. 
A description of techniques for preparing such monoclonal antibodies may be found in Stites 
et al., editors, "Basic and Clinical Immunology," (Lange Medical Publications, Los Altos, 
Calif., Fourth Edition) and references cited therein, and in Harlow and Lane ("Antibodies, A 
Laboratory Manual" Cold Spring Harbor Publications, New York, 1988). 
[0107] Throughout this application, various publications are referenced. The disclosures of 
all of these publications and those references cited within those publications in their entireties 
are hereby incorporated by reference into this application in order to more fully describe the 
state of the art to which this invention pertains. 

[01081 It will be apparent to those skilled in the art that various modifications and variations 
can be made in the present invention without departing from the scope or spirit of the 
invention. Other embodiments of the invention will be apparent to those skilled in the art 
from consideration of the specification and practice of the invention disclosed herein. It is 
intended that the specification and Examples be considered as exemplary only, with a true 
scope and spirit of the invention being indicated by the claims included herein. 



EXAMPLES 

Example 1 

General Processes 

a) General Cloning Processes: 

[0109] Cloning processes such as, for example, restriction cleavages, agarose gel 

electrophoresis, purification of DNA fragments, transfer of nucleic acids to nitrocellulose and 
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nylon membranes, linkage of DNA fragments, transformation of Escherichia coli and yeast 
cells, growth of bacteria and sequence analysis of recombinant DNA were carried out as 
described in Sambrook et al. (1989, Cold Spring Harbor Laboratory Press: ISBN 0-87969- 
309-6) or Kaiser, Michaelis and Mitchell (1994, "Methods in Yeast Genetics," Cold Spring 
Harbor Laboratory Press: ISBN 0-87969-451-3). 

b) Chemicals: . . . 

[0110] The chemicals used were obtained, if not mentioned otherwise in the text, in 

p.a. quality from Ihe companies Fluka (Neu-Ulm), Merck (Darmstadt), Roth (Karlsruhe), 

Serva (Heidelberg), and Sigma (Deisenhofen). Solutions were prepared using purified, 

pyrogen-free water, designated as H 2 0 in the following text, from a Milli-Q water system 

water purification plant (Millipore, Eschbom). Restriction endonucleases, DNA-modifying 

enzymes, and molecular biology kits were obtained from the companies AGS (Heidelberg), 

Amersham (Braunschweig), Biometra (Gottingen), Boehringer (Mannheim), Genomed (Bad 

Oeynnhausen), New England Biolabs (Schwalbach/ Taunus), Novagen (Madison, Wisconsin, 

USA), Perkin-Ehner (Weiterstadt), Pharmacia (Freiburg), Qiagen (Hilden), and Stratagene 

(Amsterdam, Netherlands). They were used, if not mentioned otherwise, according to the 

manufacturer's instructions. 

c) Plant Material: 
Arabidopsis pkl mutant 

[0111] For this study, in one series of experiments, root material of wild-type and 

pickle mutant Arabidopsis thaliana plants were used. The pkl mutation was isolated from an 
ethyl methanesulfonate-mutagenized population of the Columbia ecotype as described (Ogas 
et al., 1997, Science 277:91-94; Ogas et al., 1999, Proc. Natl. Acad. Sci. USA 96:13839- 
13844). In other series of experiments, siliques of individual ecotypes of Arabidopsis 
tltaliana and of selected Arabidopsis phytohormone mutants were used. Seeds were obtained 
from the Arabidopsis stock center. 

Brassica napus AC Excel and Cresor varieties 

[0112] Brassica napus varieties AC Excel and Cresor were used for this study to 

create cDNA libraries. Seed, seed pod, flower, leaf, stem, and root tissues were collected 
from plants that were in some cases dark-, salt-, heat-,and drought-treated. However, this 
study focused on the use of seed and seed pod tissues for cDNA libraries. 

d) Plant Growth: 
Arabidopsis thaliana 
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[0113] Plants were either grown on Murashige-Skoog medium as described in Ogas et 

al. (1997, Science 277:91-94; 1999, Proc. Natl. Acad Sci. USA 96:13839-13844) or on soil 
under standard conditions as described in Focks & Benning (1998, Plant Physiol. 118:91- 
101). 

Brassica napus ■ • rc~~**~ 

[0114] Plants (AC Excel, except where mentioned) were grown in Metromix (bcotts, 

MarysvUle, OH) at 22°C under a 14/10 light/dark cycle. Six seed and seed pod tissues of 
interest in this study were collected to create the following cDNA libraries: Immature seeds, 
mature seeds, immature seed pods, mature seed pods, night-harvested seed pods, and Cresor 
variety (high erucic acid) seeds. Tissue samples were collected within specified time points 
for each developing tissue and multiple samples within a time frame pooled together for 
eventual extraction of total RNA. Samples from immature seeds were taken between 1-25 
days after anthesis (daa), mature seeds between 25-50 daa, immature seed pods between 1-15 
daa, mature seed pods between 15-50 daa, night-harvested seed pods between 1-50 daa and 
Cresor seeds 5-25 daa. 



Example 2 

Total DNA Isolation from Plants 

[0115] The details for the isolation of total DNA relate to the working up of one gram 

fresh weight of plant material. 

[0116] CTAB buffer: 2% (w/v) N-cethyl-N^aST-triniethylammonium bromide (CTAB); 100 
mM Tris HC1 pH 8.0; 1.4 M NaCl; 20 mM EDTA. N-Laurylsarcosine buffer:10% (w/v) N- 
laurylsarcosine; 100 mM Tris HC1 pH 8.0; 20 mM EDTA. 

[0117] The plant material was triturated under liquid nitrogen in a mortar to give a 

fine powder and transferred to 2 ml Eppendorf vessels. The frozen plant material was then 
covered with a layer of 1 ml of decomposition buffer (1 ml CTAB buffer, 100 /d of N- 
laurylsarcosine buffer, 20 /d of 0-mercaptoethanol and 10 fi\ of proteinase K solution, 10 
mg/ml) and incubated at 60°C for one hour with continuous shaking. The homogenate 
obtained was distributed into two Eppendorf vessels (2 ml) and extracted twice by shaking 
with the same volume of chloroform/isoamyl alcohol (24:1). For phase separation, 
centrifugation was carried out at 8000g and RT for 15 minutes in each case. The DNA was 
then precipitated at -70°C for 30 minutes using ice-cold isopropanol. The precipitated DNA 
was sedimented at 4°C and 10,000 g for 30 minutes and resuspended in 180 /d of TE buffer 
(Sambrook et al., 1989, Cold Spring Harbor Laboratory Press: ISBN 0-87969-309-6). For 
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further purification, the DNA was treated with NaCl (1.2 M final concentration) and 
precipitated again at -70°C for 30 minutes using twice the volume of absolute ethanol. After a 
washing step with 70% ethanol, the DNA was dried and subsequently taken up in 50 (A of 
H20 + RNAse (50 mg/ml final concentration). The DNA was dissolved overnight at 4°C and 
the RNAse digestion was subsequently carried out at 37°C for 1 hour. Storage of the DNA 
took place at 4°C. 

Example 3 

Isolation of Total RNA and poly-(A)+ UNA from Plants 

Arabidopsis thaliana , „.,„„. . 

[0118] For the investigation of transcripts, both total RNA and poly-(A)+ RNA were isolated. 

RNA was isolated fiom siliques of Arabidopsis plants according to the following procedure: 

[0119] RNA preparation from Arabidopsis seeds - "hot" extraction: 

Buffers, enzymes, and solutions: 

-2MKC1 

- Proteinase K 
-Phenol (for RNA) 

- Chloroform:Isoamylalcohol 
(Phenolxholoroform 1 :1; pH adjusted for RNA) 

- 4 M LiCl, DEPC-treated 

- DEPC-treated water 

- 3M NaOAc, pH 5, DEPC-treated 

- Isopropanol 

- 70% ethanol (made up with DEPC-treated water) 

- Resuspension buffer:0.5% SDS, 10 mM Tris pH 7.5, 1 mM EDTA made up with 
DEPC-treated water as this solution can not be DEPC-treated 

- Extraction Buffer: 
0.2MNa Borate 
30 mM EDTA 

30 mM EGTA 

1% SDS (250ul of 10% SDS-solution for 2.5ml buffer) 
1% Deoxycholate (25mg for 2,5ml buffer) 
2% PVPP (insoluble - 50mg for 2.5ml buffer) 
2% PVP 40K (50mg for 2.5ml buffer) 
lOmMDTT 

100 mM jJ-Mercaptoethanol (fresh, handle under fume hood - use 35ul of 14.3M solution for 
5ml buffer) 

Extraction . . ,. . , , 

[0120] Extraction buffer was heated up to 80°C. Tissues were ground m liquid nitrogen- 
cooled mortar, and the tissue powder was transferred to a 1.5ml tube. Tissues should be kept 
frozen until buffer is added; the sample should be transferred with a pre-cooled spatula; and 
the tube should be kept in liquid nitrogen at all times. Then 350ul preheated extraction buffer 
was added (For lOOmg tissue, buffer volume can be as much as 500ul for bigger samples) to 
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tube; samples were vortexed; and the tube was heated to 80»C for approximately 1 minute 
and then kept on ice. The samples were vortexed and ground additionally with electric 
mortar. 

{omrProteinase K (0.15mg/100mg tissue) was added, and the mixture was vortexed and 
then kept at 37°C for one hour. 

First Purification , m . , ;i , , ^ 

[0122] For purification, 27ul 2M KC1 was added to the samples. The samples were chilled on 

ice for 10 minutes and then centrifuged at 12.000 rpm for 10 minutes at room temperature. 
The supernatant was transferred to a fresh, RNAase-free tube, and one phenol extraction was 
conducted, followed by a choloroform:isoamylalcohol extraction. One volume isopropanol to 
was added to the supernatant, and the mixture was chilled on ice for 10 minutes. RNA was 
pelleted by centrifugation (7000 rpm for 10 minutes at room temperature). Pellets were 
dissolved in 1 ml 4M LiCl solution by vortexing the mixture 10 to 15 minutes. RNA was 
pelleted by a 5 minute centrifugation. 

foT^ThTp^ltefwas resuspended in 500ul Resuspension buffer. Then 500 ul of phenol was 
added, and the mixture was vortexed. Then, 250ul chloroform:isoamylalcohol was added; the 
mixture was vortexed and then centrifuged for 5 minutes. The supernatant was transferred to 
a fresh tube. The choloform:isoamylalcohol extraction was repeated until the interface was 
clear. The supernatant was transferred to a fresh tube and 1/10 volume 3M NaOAc, pH 5 and 
600ul isopropanol were added. The mixture was kept at -20 for 20 minutes or longer. The 
RNA was pelleted by 10 minutes of centrifugation, and then the pellet was washed once with 
70% ethanol. All remaining alcohol was removed before dissolving the pellet in 15 to 20 ul 
DEPC-treated water. The quantity and quality of the RNA was determined by measuring the 
absorbance of a 1 :200 dilution at 260nm and 280nm. (40ug RNA/ml = 1 OD 260 ) 
[0124] RNA from roots of wild-type Arabidopsis and the pickle mutant of 

Arabidopsis was isolated as described (Ogas et al., 1997, Science 277:91-94; Ogas et al., 
1999, Proc. Natl. Acad. Sci. USA 96:13839-13844). 

[0125] The mRNA was prepared from total RNA, using the Amersham Pharmacia 

Biotech mRNA purification kit, which utilizes oligo(dT)-cellulose columns. 
[0126] Isolation of Poly-(A)+ RNA was isolated using Dyna BeadsR (Dynal, Oslo, 

Norway) following the instructions of the manufacturer's protocol. After determination of 
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the concentration of the RNA or of the poly(A)+ RNA, Ihe RNA was precipitated by addition 
of 1/10 volume of 3 M sodium acetate pH 4.6 and 2 volumes of ethanol and stored at -70°C. 

^mfseeds^ere separated from pods to create homogeneous materials for seed and seed 
pod cDNA libraries. Tissues were ground into fine powder under liquid nitrogen using a 
mortar and pestle and transferred to a 50 ml tube. Tissue samples were stored at -80 °C until 
extractions could be performed. Total RNA was extracted from tissues using RNeasy Maxi 
kit (Qiagen) according to manufacturer's protocol, and mRNA was processed from total 
RNA using Oligotex mRNA Purification System kit (Qiagen), also according to 
manufacturer's protocol. The mRNA was sent to Hyseq Pharmaceuticals Incorporated 
(SunnyviUe, CA) for further processing of mRNA from each tissue type into cDNA libraries 
and for use in their proprietary processes in which similar inserts in plasmids are clustered 
based on hybridization patterns. 

Example 4 

cDNA Library Construction m 

[0128] For cDNA library construction, first strand synthesis was achieved using 

Murine Leukemia Virus reverse transcriptase (Roche, Mannheim, Germany) and oligo-d(T)- 

primers, second strand synthesis by incubation with DNA polymerase I, Klenow enzyme and 

RNAseH digestion at 12°C (2 hours), 16°C (1 hour) and 22°C (1 hour). The reaction was 

stopped by incubation at 65°C (10 minutes) and subsequently transferred to ice. Double 

stranded DNA molecules were blunted by T4-DNA-polymerase (Roche, Mannheim) at 37°C 

(30 minutes). Nucleotides were removed by phenoVchloroform extraction and Sephadex G50 

spin columns. EcoRI adapters (Pharmacia, Freiburg, Germany) were ligated to the cDNA 

ends by T4-DNA-ligase (Roche, 12°C, overnight) and phosphorylated by incubation with 

polynucleotide kinase (Roche, 37°C, 30 minutes). This mixture was subjected to separation 

on a low melting agarose gel. DNA molecules larger than 300 base pairs were eluted from 

the gel, phenol extracted, concentrated on Elutip-D-columns (Schleicher and Schuell, Dassel, 

Germany) and were ligated to vector arms and packed into lambda ZAPH phages or lambda 

ZAP-Express phages using the Gigapack Gold Kit (Stratagene, Amsterdam, Netherlands) 

using material and following the instructions of the manufacturer. 

[0129] Brassica cDNA libraries were generated at Hyseq Pharmaceuticals 

Incorporated (Sunnyville, CA) No amplification steps were used in the library production to 
retain expression information. Hyseq's genomic approach involves grouping the genes into 
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clusters and then sequencing representative members from each cluster. The cDNA libraries 
were generated from oligo dT column purified mRNA. Colonies from transformation of the 
cDNA library into E. coli were randomly picked and the cDNA insert were amplified by PCR 
and spotted on nylon membranes. A set of 33- P radiolabeled oligonucleotides were 
hybridized to the clones, and the resulting hybridization pattern determined to which cluster 
a particular clone belonged. The cDNA clones and their DNA sequences were obtained for 
use in overexpression in transgenic plants and in other molecular biology processes described 
herein. 

Example 5 

Identification of IMP Genes of Interest 

Arabidovsis thaliana pkl mutant tu»„;„vi» 
[0130] The pickle Arabidopsis mutant was used to identify LMP-encodmg genes. The pickle 

mutant accumulates seed storage compounds, such as seed storage lipids and seed storage 
proteins, in me root tips (Ogas et al., 1997, Science 277:91-94; Ogas et al., 1999, Proc. Natl. 
Acad. Sci. USA 96:13839-13844). The mRNA isolated from roots of wild-type and pickle 
plants was used to create a subtracted and normalized cDNA library (SSH library) containing 
cDNAs that are only present in the pickle roots, but not in the wild-type roots. Clones from 
the SSH library were spotted onto nylon membranes and hybridized with radio-labeled pickle 
or wild-type root mRNA to ascertain that the SSH clones were more abundant in pickle roots 
compared to wild-type roots. These SSH clones were randomly sequenced and the sequences 
were annotated (See Example 9). Based on the expression levels and on these initial 
functional annotations (See Table 3), clones from the SSH library were identified as potential 
LMP-encoding genes. 

[0131] To identify additional potential gene targets from the Arabidopsis pickle 

mutant, the Megasort™ and MPSS technologies of Lynx Therapeutics Inc. were used. 
MegaSort is a micro-bead technology that allows both the simultaneous collection of millions 
of clones on as many micro-beads (See Brenner et al, 1999, Proc. Natl. Acad. Sci. USA 
97:1665-1670). Genes are identified based on their differential expression in wild-type and 
pickle Arabidopsis mutant roots. RNA and mRNA are isolated from wild-type and mutant 
roots using standard procedures. The MegaSort technology enables the identification of 
over- and under-expressed clones in two mRNA samples without prior knowledge of the 
genes and is thus useful to discover differentially expressed genes that can encode IMP 
proteins. The MPSS technology enables the quantitation of the abundance of mRNA 
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transcripts in mRNA samples (Brenner et al., Nat Biotechnol. 18:630-4) and was used to 
obtain expression profiles of wild-type and pickle root mRNAs. 

[0132] Other IMP candidate genes were identified by randomly selecting various 
Arabidopsis phytohormone mutants (e.g. mutants obtained from EMS treatment) from the 
Arabidopsis stock center. These mutants and control wild-type plants were grown under 
standard conditions in growth chambers and screened for the accumulation of seed storage 
compounds. Mutants showing altered levels of seed storage compounds were considered as 
having a mutation in a LMP candidate gene and were investigated further. 

Brasswa "V"^ expression profile ^ was obtained from the Hyseq clustering process. 
Clones showing 75% or greater expression in seed libraries compared to the other tissue 
libraries were selected as LMP candidate genes. The Brassica napus clones were selected for 
overexpression in Arabidopsis based on their expression profile. 

Example 6 

Cloning ofjull-length cDNAs and orthologs of identified LMP genes 

[0^4^° PStS rt Fuu"length sequences of the Arabidopsis ihaliana partial cDNAs (ESTs) mat 
were identified in the SSH library and from MegaSort and MPSS EST sequencing were 
isolated by RACE PCR using the SMART RACE cDNA amplification kit from Clontech 
allowing bolh 5' and 3' rapid amplification of cDNA ends (RACE). The isolation of cDNAs 
and the RACE PCR protocol used were based on the manufacturer's conditions. The RACE 
product fragments were extracted from agarose gels with a QIAquick Gel Extraction Kit 
(Qiagen) and ligated into the TOPO pCR 2.1 vector (mvitrogen) following manufacturer's 
instructions. Recombinant vectors were transformed into TOP10 cells Cnvitrogen) using 
standard conditions (Sambrook et al., 1989). Transformed cells were grown overnight at 37°C 
on LB agar containing 50 ug/ml kanamycin and spread with 40 pi of a 40 mg/ml stock 
solution of X-gal in dimethylformamide for blue-white selection. Single white colonies were 
selected and used to inoculate 3 ml of liquid LB containing 50 pg/ml kanamycin and grown 
overnight at 37°C. Plasmid DNA was extracted using the QIAprep Spin Miniprep Kit 
(Qiagen) following manufacturer's instructions. Subsequent analyses of clones and 
restriction mapping was performed according to standard molecular biology techniques 
(Sambrook et al., 1989). 

[0135] Gene sequences can be used to identify homologous or heterologous genes (orthologs, 
the same LMP gene from another plant) from cDNA or genomic libraries. This can be done 
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by designing PCR primers to conserved sequences identified by multiple sequence 
alignments. Orthologs are often identified by designing degenerate primers to full-length or 
partial sequences of genes of interest Homologous genes (e.g. full-length cDNA clones) can 
be isolated via nucleic acid hybridization using, for example, cDNA libraries: Depending on 
the abundance of the gene of interest, 100,000 up to 1,000,000 recombinant bacteriophages 
are plated and transferred to nylon membranes. After denaturation with alkali, DNA is 
immobilized on the membrane by e. g. UV cross Unking. Hybridization is carried out at high 
stringency conditions. Aqueous solution hybridization and washing is performed at an ionic 
strength of 1 M NaCl and a temperature of 68°C. Hybridization probes are generated by, 
e.g., radioactive ( 32 P) nick transcription labeling (High Prime, Roche, Mannheim, Germany). 
Signals are detected by autoradiography. 

[0136] Partially homologous or heterologous genes mat are related but not identical 

can be identified in a procedure analogous to the above-described procedure using low 
stringency hybridization and washing conditions. For aqueous hybridization, the ionic 
strength is normally kept at 1 M NaCl while the temperature is progressively lowered from 
68 to 42°C. 

[01371 Isolation of gene sequences with homology (or sequence identity/similarity) 

only in a distinct domain (for example 10-20 amino acids) can be carried out by using 
synthetic radiolabeled oligonucleotide probes. Radiolabeled oligonucleotides are prepared by 
phosphorylation of the 5-prime end of two complementary oligonucleotides with T4 
polynucleotide kinase. The complementary oligonucleotides are annealed and ligated to form 
concatemers. The double stranded concatemers are than radiolabeled by, for example, nick 
transcription. Hybridization is normally performed at low stringency conditions using high 
oligonucleotide concentrations. 

Oligonucleotide hybridization solution: 
6 x SSC 

0.01 M sodium phosphate 
1 mM EDTA (pH 8) 
0.5 % SDS 

100 ug/ml denaturated salmon sperm DNA 
0.1 % nonfat dried milk 

[0138] During hybridization, temperature is lowered stepwise to 5-10°C below the 

estimated oligonucleotide T m or down to room temperature followed by washing steps and 
autoradiography. Washing is performed with low stringency such as three washing steps 
using 4 x SSC. Further details are described by Sambrook et al. (1989, "Molecular Cloning: 
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A Laboratory Manual", Cold Spring Harbor Laboratory Press) or Ausubel et al. (1994, 
"Current Protocols in Molecular Biology", John Wiley & Sons). 



Brassica napus 

10139] Clones of Brassica napus genes obtained from Hyseq were sequenced at using 

a ABI 377 slab gel sequencer and BigDye Tenninator Ready Reaction kits (PE Biosystems, 
Foster City, CA). Gene specific primers were designed using these sequences, and genes 
were amplified from the plasmid supplied from Hyseq using touch-down PCR In some 
cases, primers were designed to add an "AACA" Kozak-like sequence just upstream of the 
gene start codon and two bases downstream were, in some cases, changed to GC to facilitate 
increased gene expression levels (Chandrashekhar et al., 1997, Plant Molecular Biology 
35:993-1001). PCR reaction cycles were: 94°C, 5 minutes; 9 cycles of 94°C, 1 minute, 65°C, 
1 minute, 72°C, 4 minutes and in which the anneal temperature was lowered by 1°C each 
cycle; 20 cycles of 94»C, 1 minute, 55°C, 1 minute, 72°C, 4 minutes; and the PCR cycle was 
ended with 72°C, 10 minutes. Amplified PCR products were gel purified from 1% agarose 
gels using GenElute -EtBr spin columns (Sigma), and after standard enzymatic digestion, 
were ligated into the plant binary vector pBPS-GBl for transformation of Arabidopsis. The 
binary vector was amplified by overnight growth in E. coli DH5 in LB media and appropriate 
antibiotic, and plasmid was prepared for downstream steps using Qiagen MiniPrep DNA 
preparation kit. The insert was verified throughout the various cloning steps by determining 
its size through restriction digest and inserts were sequenced in parallel to plant 
transformations to ensure the expected gene was used in Arabidopsis transformation. 




RT-PCR and clonin. 
LMP eenes 

[0140] Full-length LMP cDNAs were isolated by RT-PCR from Arabidopsis thaliana, 

Brassica napus, or Physcomitrella patens RNA The synthesis of the first strand cDNA was 
achieved using AMV Reverse Transcriptase (Roche, Mannheim, Germany). The resulting 
single-stranded cDNA was amplified via Polymerase Chain Reaction (PCR) utilizing two 
gene-specific primers. The conditions for the reaction were standard conditions with Expand 
High Fidelity PCR system (Roche). The parameters for the reaction were: five minutes at 
94°C followed by five cycles of 40 seconds at 94°C, 40 seconds at 50°C, and 1.5 minutes at 
72°C. This was followed by thirty cycles of 40 seconds at 94°C, 40 seconds at 65°C, and 1.5 
minutes at 72°C. The fragments generated under these RT-PCR conditions were analyzed by 
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agarose gel electrophoresis to make sure that PCR products of the expected length had heen 
obtained. 

[0141] Full-length LMP cDNAs were isolated by using synthetic oligonucleotide 

primers (MWG-Biotech) designed based on the LMP gene specific DNA sequence that was 
determined by EST sequencing and by sequencing of RACE PCR products. The 5' PCR 
primers ("forward primer", F) for SEQ ID NO:83, SEQ ID NO:85, SEQ ID NO:87, SEQ ID 
NO:89, SEQ ID NO:91, SEQ ID NO:93, SEQ ID NO:95, SEQ ID NO:97, SEQ ID NO:99, 
SEQ ID NO:101, SEQ ID NO:103, SEQ ID NO:105, SEQ ED NO:107, SEQ ID NO:109, 
SEQ ID NO:lll, SEQ ID NO:113, and SEQ ID NO:115 contained an AscI restriction site 5' 
upstream of the ATG start codon. The 5' PCR primers ("forward primer", F) for SEQ ID 
NO:117, SEQ ED NO:119, SEQ ED NO:121, SEQ ID NO:123, SEQ ID NO:125, SEQ ED 
NO:127, SEQ ED NO:129, SEQ ED NO:133, SEQ ED NO:135, SEQ ED NO:137, SEQ ED 
NO:139, SEQ ED NO:141, SEQ ED NO:143, SEQ ED NO:145, SEQ ED NO:147, SEQ ED 
NO:149, SEQ ED NO:151, SEQ ED NO:153, SEQ ED NO:155, SEQ ED NO:157, SEQ ED 
NO:159, SEQ ED NO:49, and SEQ ED NO:131, contained a Not! restriction site 5' upstream 
of the ATG start codon. The 3' PCR primers ("reverse primers", R) for SEQ ED NO:84, SEQ 
ED NO:86, SEQ ED NO:88, SEQ ED NO:90, SEQ ED NO:92, SEQ ED NO:94, SEQ ED NO:96, 
SEQ ID NO:98, SEQ ED NO:100, SEQ ED NO:102, SEQ ED NO:104, SEQ ED NO:106, SEQ 
ED NO:108, SEQ ED NO:110, SEQ ED NO:112, SEQ ED NO:114, and SEQ ED NO:116 
contained a Pad restriction site 3' downstream of the stop codon. The 3' PCR primers 
("reverse primers", R) for SEQ ED NO:118, SEQ ED NO:120, SEQ ED NO:122, SEQ ED 
NO:124, SEQ ED NO:126, SEQ ED NO:128, SEQ ED NO:130, SEQ ID NO:134, SEQ ED 
NO:136, SEQ ED NO:138, and SEQ ED NO:140, contained a NotI restriction site 3' 
downstream of the stop codon. The 3' PCR primers ("reverse primers", R) for SEQ ED 
NO:142, SEQ ED NO:144, SEQ ED NO:146, SEQ ED NO:148, SEQ ID NO:150, SEQ ED 
NO:152, SEQ ED NO:156, SEQ ED NO:158, SEQ ED NO:160, SEQ ED NO:50, and SEQ ED 
NO: 132, contained a StuI restriction site 3' downstream of the stop codon. The 3' PCR 
primers ("reverse primers", R) for SEQ ED NO: 154 contained an EcoRV restriction site 3' 
downstream of the stop codon. 

[0142] The restriction sites were added so that the RT-PCR amplification products 

could be cloned into the restriction sites located in the multiple cloning site of the binary 
vector. The following "forward" (F) and "reverse" (R) primers were used to amplify me full- 
length Arabidopsis thaliana or Brassica napus cDNAs by RT-PCR using RNA from 
Arabidopsis thaliana or Brassica napus as original template: 
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For amplification of SEQ ID NO:l 

Pkl23F (5'- ATGGCGCGCCATGGCAATCTTCCGAAGTACACTAGT-3') 

(SEQIDNO:83) 

Pkl23R (5'- GCTTAATTAATTAAGGGCACTTGAGACGGCCA -3') (SEQ ID 
NO:84) 

For amplification of SEQ ID NO:3 

Pkl97F (5'- ATGGCGCGCCAACAATGGAGAATGGAGCAACGACG -3') 

(SEQIDNO:85) 

Pkl97R (5'- GCTTAATTAACTATATGGTTGGATATTGAGTCTTGGC -3') 
(SEQK>NO:86) 
For amplification of SEQ ID NO:5 

Pkl36F (5 s - ATGGCGCGCCATGGCTGAAAAAGTAAAGTCTGGTCA-3') 

(SEQIDNO:87) 

Pkl36R (5'- GCTTAATTAATTATAGCTCCTCAGATCCCTCCGA-3') 
(SEQIDNO:88) 
For amplification of SEQ ID NO:7 

Pkl56F (5'- ATGGCGCGCCATGGCTGGAGAAGAAATAGAGAGGG-3') 

(SEQIDNO:89) 

Pkl56R (5'- GCTTAATTAATTAAACAG AGGCTTCTCTACTCTCACTT-3 ') 
(SEQIDNO:90) 
For amplification of SEQ ID NO:9 

Pkl59F (5'- ATGGCGCGCCATGGCTGGAGTGATGAAGTTGGC-3 ') 

(SEQIDNO:91) 

Pkl59R (5'- GCTTAATTAATCACCTCACGGTGTTGCAGTTG-3 ') 
(SEQIDNO:92) 

For amplification of SEQ ID NO:l 1 

Pkl 79F (5 '-ATGGCGCGCCAAACAATGGGGCTTGCTGTGGTGG-3 ') 

(SEQIDNO:93) 

Pkl 79R (5 '-GCTTAATTAATTACTGC AAGGCTTTC AATATATTTC-3 ') 
(SEQIDNO:94) 
For amplification of SEQ ID NO: 13 

Pk202F (5'- ATGGCGCGCCAAC AATGGCGTTC ACGGCGCTTGT-3 ') 

(SEQIDNO:95) 
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Pk202R (5'- GCTTAATTAATCAACAAGTAGGATAAGGAACACCACA-3') 
(SEQ ED NO:96) 
For amplification of SEQ ID NO: 15 

Pk 206F (5'- ATGGCGCGCCAACAATGGCCCTTGATGAGCTTCTCAAG-3') 

(SEQIDNO:97) 

Pk206R (5'- GOTAATTAATCAGAGAGAAGCAGAGTTTGTTCGC-3') 
(SEQIDNO:98) 
For amplification of SEQ ID NO:17 

Pk207F (5*- ATGGCGCGCCAACAATGGCGCAATCCCGATTATTAG-3') 

(SEQIDNO:99) 

Pk207R (5'- GCTTAATTAATTAAAACCACTCGCCTCTCATTTC -3') 
(SEQ ED NO: 100) 
For amplification of SEQ ID NO:19 

Pk209F (5'- ATGGCGCGCCATGTCCGTGGCTCGATTCGAT -3') 

(SEQ ID NO: 101) 

Pk209R (5'- GCTTAATTAACTAATCCTCTAGCTCGATGATTTTGAC-3 ') 
(SEQ ID NO: 102) 
For amplification of SEQ ID NO:21 

Pk2 1 5F (5 '-ATGGCGCGCCAAC AATGGCGATTTACAGATC 

TCTAAGAAAG-3') (SEQ ID NO: 103) 

Pk21 5R (5 '-GCTTAATTAATTACCTTAGATAAGTGATCCATGTCTGG-3') 
(SEQ ID NO: 104) 
For amplification of SEQ ID NO:23 

Pk239F (5'- ATGGCGCGCCAACAATGGTAAAGGAAACT 
CTAATTCCTCCG-3') (SEQ ID NO: 105) 

Pk239R(5'-GCTTAATTAACTACCAGCCGAAGATTGGCTTGT-3') 

(SEQIDNO:106) 
For amplification of SEQ ID NO:25 

Pk240F (5'- ATGGCGCGCC ATTTGGAGAGCAATGGCGACTT-3 ') 

(SEQ ID NO: 107) 

Pk240R(5'- GCTTAATTAATTACATCGAACGAAGAAGC 
ATCAA-3') (SEQ ID NO: 108) 
For amplification of SEQ ID NO:27 

Pk241F (5'- ATGGCGCGCCCATCCTCAGAAAGAATGGCTCAAA-3 ') 
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(SEQIDNO:109) 

Pk241R (5'- GCTTAATTAATTAGCTTTCTTCACCATCATC 
GGTG-3') (SEQ ID NO: 1 10) 
For amplification of SEQ ID NO:29 

Pk242F (5'- ATGGCGCGCCAACAATGGGTGCAGGTGGAAGAATGCC-3') 

(SEQ ID NO: 111) 

Pk242R (5'- GCTTAATTAATCATAACTTATTGTTGTACCAGTA 
CACACC-3') (SEQ ID NO:l 12) 
For amplification of SEQ ID NO:31 

BnOllF (5'- ATGGCGCGCCAACAATGGCTTCAATAAAT 

GAAGATGTGTCT-3') (SEQ ID NO: 113) 

BnOllR (5'- GACTTAATTAATCAATTGGTGGGATTAACGA 

CTCCA-3') (SEQ ID NO:l 14) 
For amplification of SEQ ID NO:33 

Bn077F (5 ' -ATGGCGCGCCAACAATGGCTAC A 

TTCTCTTGTAATTCTTATGA-3') (SEQ ID NO: 115) 

Bn077R (5'- GACTTAATTAATCAGAAGCGGCCATTAAAATT 

ACCCA-3') (SEQ IDNO:116) 
For amplification of SEQ ID NO:35 

JbOOlF (5'- ATAAGAATGCGGCCGCCATGGCAACGGAATGCATTGCA -3') 

(SEQ ID NO: 117) 

JbOOIR (5'- ATAAGAATGCGGCCGCTTAGAAACTTCT 

TCTGTTCTT -3') (SEQ ID NO:l 18) 
For amplification of SEQ ID NO:37 

Jb002F (5'- ATAAGAATGCGGCCGCCATGGCGTCAGAGC 

AAGCAAGG -3') (SEQ ID NO: 1 19) 

Jb002R (5'- ATAAGAATGCGGCCGCTCAACGTTGTCC 

ATGTTCCCG -3') (SEQ ID NO: 120) 
For amplification of SEQ ID NO:39 

Jb003F (5'- ATAAGAATGCGGCCGCCATGGCTAAGTC 

TTGCTATTTCA -3') (SEQ ID NO:121) 

Jb003R (5'- ATAAGAATGCGGCCGCTCAGGCGCTATAG 

CCTAAGATT -3') (SEQ ID NO: 122) 
For amplification of SEQ ID NO:41 
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Jb005F (5'- ATAAGAATGCGGCCGCCATGGACGGTGCCGG 
AGAATCACGA -3') (SEQ ID NO: 123) 
Jb005R (5'- ATAAGAATGCGGCCGCCTAATAACTTAA 
AGTTACCGGA -3') (SEQ ID NO:124) 
For amplification of SEQ ID NO:43 

Jb007F (5'- ATAAGAATGCGGCCGCCATGTCGAGAGCTTTG 

TCAGTCG -3') (SEQ ID NO: 125) 

Jb007R (5'- ATAAGAATGCGGCCGCCATGTCGAGAGCTTT 
GTCAGTCG -3') (SEQ ID NO:126) 
For amplification of SEQ ID NO:45 

Jb009F (5'- ATAAGAATGCGGCCGCCATGGCAAGCAGCGAC 

GTGAAGCT -3 ') (SEQ ID NO: 127) 

Jb009R (5'- ATAAGAATGCGGCCGCTCAACCAAGCCAAGAA 
GCACCC -3') (SEQ ID NO:128) 
For amplification of SEQ ID NO:47 

JbOBF (5'- ATAAGAATGCGGCCGCCATGGCGTCTCAACAAGA 

GAAGA -3') (SEQ ED NO: 129) 

Jb013R (5'- ATAAGAATGCGGCCGCTTAGGTCTTGGTCCTGA 
ATTTG -3') (SEQ ID NO:130) 
Fot amplification of SEQ ID NO:51 

Jb017F (5'- ATAAGAATGCGGCCGCCATGGCTCCTTCAACAA 

AAGTTC -3')(SEQ ID NO:133) 

Jb017R (5'- ATAAGAATGCGGCCGCTCAAACACTGCTGATAGTATTT -3') 
(SEQ ID NO: 134) 
For amplification of SEQ ID NO:53 

Jb024F (5'- ATAAGAATGCGGCCGCCATGCGGTGCTTTCC 

ACCTCCCT -3') (SEQ ID NO:135) 

Jb024R (5'- ATAAGAATGCGGCCGCTTAGTTTTGTAATGGTGAG 
AGC -3') (SEQ ID NO:136) 
For amplification of SEQ ID NO:55 

Jb027F (5'- ATAAGAATGCGGCCGCCATGCTTCTAATTCTAG 

CGATTT -3') (SEQ ED NO: 137) 

Jb027R (5'- ATAAGAATGCGGCCGCTCAGATAACCTTCTTCTTCTCG -3') 
(SEQ ED NO: 138) 



55 



WO 2004/013304 



T/US2003/024364 



For amplification of SEQ ID NO:57 

OO-IF (5'- ATTGCGGCCGCACAATGGCACATGCCACGTTTACG -3') 

(SEQ ID NO: 139) 

OO-IR (5'- ATTGCGGCCGCTTAGTCTTCATGGTCCCATAGATC -3') 

(SEQ ID NO: 140) 
For amplification of SEQ ID NO:59 

00-2F (5'- GCGGCCGCCATGGCGTCTGAGAAACAAAAAC -3') 

(SEQ ID NO: 141) 

00-2R (5'- AGGCCTTTACGCATTTACCACAGCTCC -3') (SEQ ID NO:142) 

For amplification of SEQ ID NO:61 

00-3F (5'- GCGGCCGCATGGATTCAACGAAGCTTAGTGAGC -3') 

(SEQ ID NO: 143) 

00-3R (5'- AGGCCTTTACTGAGGTCCTGCAAATTTG -3') (SEQ ID NO: 144) 

For amplification of SEQ ID NO:63 

00-4F (5'- GCGGCCGCCATGAAGGTTCACGAGACAAGA -3') 

(SEQ ID NO:145) 

00-4R (5'- AGGCCTCTACTCTGGTTCGACATCGAC -3') (SEQ IDNO:146) 

For amplification of SEQ ID NO:65 

00-5F (5'- GCGGCCGCCATGTCTACCCCAGCTGAATC -3') (SEQ ID NO: 147) 
00-5R (5*- AGGCCTCTAATTGTAGAGATCATCATC -3') (SEQ ID NO: 148) 

For amplification of SEQ ID NO:67 

00-6F (5'- GCGGCCGCCATGGACAAATCTAGTACCATG -3') 

(SEQ ID NO: 149) 

00-6R (5'- AGGCCTTCAGCTACCACCCTTTTGTTTGAG -3') (SEQ ID NO:150) 

For amplification of SEQ ID NO:69 

00-8F (5'- GCGGCCGCCATGGCGAAATCTCAGATCTGG -3') 

(SEQIDNO:151) 

00-8R (5'- AGGCCTTTAAGAAGAAGCAACGAACGTG -3') (SEQ ID NO: 152) 

For amplification of SEQ ID NO:71 

00-9F (5'- GCGGCCGCCATGGCGTCGAGCGATGAGCG -3') (SEQ ID NO:153) 
00-9R (5'- GATATCTTACGGGAACGGAGCCAATTTC -3') (SEQ ID NO:154) 

For amplification of SEQ ID NO:73 

OO-10F (5*- GCGGCCGCCATGGCGACTCTTAAGGTTTCTG -3') 

(SEQ ID NO: 155) 
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OO-10R (5'- AGGCCTTTAAGCATCATCTTCACCGAG -3') (SEQ IDNO:156) 

For amplification of SEQ ID NO:75 

OO-UF (5'- GCGGCCGCCATGGTGGATCTATTGAACTCG -3') 

(SEQEDNO:157) 

OO-l 1R (5'- AGGCCTTTACAACTCTTGGATATTAAAC -3') (SEQ ID NO:158) 

For amplification of SEQ ID NO:77 

00-12F (5'- GCGGCCGCCATGGCTGGAAAACTCATGCAC -3') 

(SEQDDNO:159) 

00-12R (5'- AGGCCTTTATGGCTCGACAATGATCTTC -3') (SEQ ID NO:160) 

For amplification of SEQ ID NO:79 

pp82F (5'- ATGGCGCGCCCGACATGAAGCGACGTTGAACG -3') 

(SEQBDNO:49) 

pp82R (5'- GCTTAATTAACTTTCCGCAGCCTTCAGGCCGC -3') 

(SEQIDNO:50) 
For amplification of SEQ ID NO: 81 

Pk225F (5'- GGTTAATTAAGGCGCGCCCCCGGAAGCGATGCTGAG -3') 

(SEQIDNO:131) 

Pk225R (5'- ATCTCGAGGACGTCCCACAGCCACCGGATTC -3') 
(SEQ ID NO: 132) 



Example 7 

Identification of Genes of Interest by Screening Expression Libraries with Antibodies 
[0143] The cDNA clones can be used to produce recombinant protein, for example, in 

E. coli (e. g. Qiagen QIAexpress pQE system). Recombinant proteins are then normally 
affinity purified via Ni-NTA affinity chromatography (Qiagen). Recombinant proteins can 
be used to produce specific antibodies for example by using standard techniques for rabbit 
immunization. Antibodies are affinity purified using a Ni-NTA column saturated with the 
recombinant antigen as described by Gu et al. (1994, BioTechniques 17:257-262). The 
antibody can men be used to screen expression cDNA libraries to identify homologous or 
heterologous genes via an immunological screening (Sambrook et al., 1989, Molecular 
Cloning: A Laboratory Manual", Cold Spring Harbor Laboratory Press; or Ausubel et al. 
1994, "Current Protocols in Molecular Biology", John Wiley & Sons). 
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Example 8 

Northern-Hybridization 

[01441 For. RNA hybridization, 20 /ig of total RNA or 1 fig of poly-(A)+ RNA was 

separated by gel electrophoresis in 1.25% strength agarose gels using formaldehyde as 
described in Amasino (1986, Anal. Biochem. 152:304), transferred by capillary attraction 
using 10 x SSC to positively charged nylon membranes (Hybond N+, Amersham, 
Braunschweig), immobilized by UV light, and pre-hybridized for 3 hours at 68°C using 
hybridization buffer (10% dextran sulfate w/v, 1 M NaCl, 1% SDS, 100 fig/ml of herring 
sperm DNA). The labeling of the DNA probe with the Highprime DNA labeling kit (Roche, 
Mannheim, Germany) was carried out during the pre-hybridization using alpha- 32 P dCTP 
(Amersham, Braunschweig, Germany). Hybridization was carried out after addition of the 
labeled DNA probe in the same buffer at 68°C overnight. The washing steps were carried out 
twice for 15 minutes using 2 x SSC and twice for 30 minutes using 1 x SSC, 1% SDS at 
68°C. The exposure of the sealed filters was carried out at -70°C for a period of 1 day to 14 
days. 

Example 9 

DNA Sequencing and Computational Functional Analysis 

[0145] The SSH cDNA library as described in Examples 4 and 5 was used for DNA 

sequencing according to standard methods, in particular by the chain termination method 
using the ABI PRISM Big Dye Terminator Cycle Sequencing Ready Reaction Kit (Perkin- 
Ehner, Weiterstadt, Germany). Random sequencing was carried out subsequent to 
preparative plasmid recovery from cDNA libraries via in vivo mass excision, 
retransformation, and subsequent plating of DH10B on agar plates (material and protocol 
details from Stratagene, Amsterdam, Netherlands). Plasmid DNA was prepared from 
overnight grown E. coli cultures grown in Luria-Broth medium containing ampicillin (See 
Sambrook et al. (1989, Cold Spring Harbor Laboratory Press: ISBN 0-87969-309-6)) on a 
Qiagene DNA preparation robot (Qiagen, Hilden) according to the manufacturer's protocols. 
Sequencing primers with the following nucleotide sequences were used: 

5 '-CAGGAAACAGCTATGACC-3 ' SEQIDNO:161 
5 '-CTAAAGGGAACAAAAGCTG-3 ' SEQ ID NO:162 
5 '-TGTAAAACGACGGCCAGT-3 ' SEQIDNO:163 

[0146] Sequences were processed and annotated using the software package EST- 

MAX commercially provided by Bio-Max (Munich, Germany). The program incorporates 
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practically all bioinformatics methods important for functional and structural characterization 
of protein sequences. For reference see http://pedant.mips.biochem.mpg.de. 
[0147] The most important algorithms incorporated in EST-MAX are: FASTA: Very 

sensitive protein sequence database searches with estimates of statistical significance 
(Pearson W.R, 1990, Rapid and sensitive sequence comparison with FASTP and FASTA. 
Methods Enzymol. 183:63-98); BLAST: Very sensitive protein sequence database searches 
with estimates of statistical significance (Altschul S.F., Gish W., Miller W., Myers E.W. and 
Lipman D.J. Basic local alignment search tool. J. Mol. Biol. 215:403-410). PREDATOR: 
High-accuracy secondary structure prediction from single and multiple sequences. (Frishman 
& Argos 1997, 75% accuracy in protein secondary structure prediction. Proteins 27:329-335). 
CLUSTALW: Multiple sequence alignment (Thompson, J.D., Higgins, D.G. and Gibson, T.J. 
1994, CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment 
through sequence weighting, positions-specific gap penalties and weight matrix choice, 
Nucleic Acids Res. 22:4673-4680). TMAP: Transmembrane region prediction from multiply 
aligned sequences (Persson B. & Argos P. 1994, Prediction of transmembrane segments in 
proteins utilizing multiple sequence alignments, J. Mol. Biol. 237:182-192). 
ALOM2:Transmembrane region prediction from single sequences (Klein P., Kanehisa M., 
and DeLisi C. 1984, Prediction of protein function from sequence properties: A discriminant 
analysis of a database. Biochim. Biophys. Acta 787:221-226. Version 2 by Dr. K. Nakai). 
PROSEARCH: Detection of PROSITE protein sequence patterns. Kolakowski L.F. Jr., 
Leunissen J.A.M. and Smith J.E. 1992, ProSearch: fast searching of protein sequences with 
regular expression patterns related to protein structure and function. Biotechniques 13:919- 
921). BLIMPS: Similarity searches against a database of ungapped blocks (Wallace & 
Henikoff 1992, PATMAT:A searching and extraction program for sequence, pattern and 
block queries and databases, CABIOS 8:249-254. Written by Bill Alford). 

Example 10 

Plasmids for Plant Transformation 

[0148] For plant transformation, various binary vectors such as a pBPS plant binary 

vector were used. Construction of the plant binary vectors was performed by ligation of the 
cDNA in sense or antisense orientation into the vector. In such vectors, a plant promoter was 
located 5-prime to the cDNA, where it activated transcription of the cDNA; and a 
polyadenylation sequence was located 3 '-prime to the cDNA. Various plant promoters were 
used such as a constitutive promoter (Superpromoter), a seed-specific promoter, and a root- 
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specific promoter. Tissue-specific expression was achieved by using a tissue-specific 
promoter. For example, in some instances, seed-specific expression was achieved by cloning 
the napin or LeB4 or USP promoter 5-prime to the cDNA. Also, any other seed specific 
promoter element can be used, and such promoters are well known to one of ordinary skill in 
the art. For constitutive expression within the whole plant, in some instances, the 
Superpromoter or the CaMV 35S promoter was used. The expressed protein also can be 
targeted to a cellular compartment using a signal peptide, for example for plastids, 
mitochondria, or endoplasmic reticulum (Kermode, 1996, Crit. Rev. Plant Sci. 15:285-423). 
The signal peptide is cloned 5-prime in frame to the cDNA to achieve subcellular localization 
of the fusion protein. 

[0149] The plant binary vectors comprised a selectable marker gene driven under the 

control of one of various plant promoters, such as the AtAct2-I promoter and the Nos- 
promoter, the IMP candidate cDNA under the control of a root-specific promoter, a seed- 
specific promoter, a non-tissue specific promoter, or a constitutive promoter; and a 
terminator. Partial or full-length IMP cDNA was cloned into the plant binary vector in sense 
or antisense orientation behind the desired promoter. The recombinant vector containing the 
gene of interest was transformed into ToplO cells (Invitrogen) using standard conditions. 
Transformed cells were selected for on LB agar containing the selective agent, and cells were 
grown overnight at 37°C. Plasmid DNA was extracted using the QIAprep Spin Miniprep Kit 
(Qiagen) following manufacturer's instructions. Analysis of subsequent clones and 
restriction mapping was performed according to standard molecular biology techniques 
(Sambrook et al., 1989, Molecular Cloning, A Laboratory Manual. 2"d Edition. Cold Spring 
Harbor Laboratory Press. Cold Spring Harbor, NY). 

Example 11 

Agrobacterium Mediated Plant Transformation , q , 

[01501 Agrobacterium mediated plant transformation with the LMP nucleic acids descnbed 

herein can be performed using standard transformation and regeneration techniques (Gelvin, 

Stanton B. & Schilperoort R.A, Plant Molecular Biology Manual, 2nd ed. Kluwer Academic 

Publ., Dordrecht 1995 in Sect., Ringbuc Zentrale Signatur:BTll-P; Click, Bernard R. and 

Thompson, John E. Methods in Plant Molecular Biology and Biotechnology, S. 360, CRC 

Press, Boca Raton 1993). For example, Agrobacterium mediated transformation can be 

performed using the GV3 (pMP90) (Koncz & Schell, 1986, Mol. Gen. Genet 204:383-396) 

orLBA4404 (Clontech) Agrobacterium tumefaciens strain. 
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[0151] Arabidopsis thaliana can be grown and transformed according to standard 

conditions (Bechtold, 1993, Acad. Sci. Paris. 316:1194-1199; Bent et al., 1994, Science 
265:1856-1860). Additionally, rapeseed can be transformed with the LMR nucleic acids of 
the present invention via cotyledon or hypocotyl transformation (Moloney et al., 1989, Plant 
Cell Report 8:238-242; De Block et al., 1989, Plant Physiol. 91:694-701). Use of antibiotics 
for Agrobacterium and plant selection depends on the binary vector and the Agrobacterium 
strain used for transformation. Rapeseed selection is normally performed using kanamycin as 
selectable plant marker. Additionally, Agrobacterium mediated gene transfer to flax can be 
performed using, for example, a technique described by Mlynarova et al. (1994, Plant Cell 
Report 13:282-285). 

[0152] Transformation of soybean can be performed using for example a technique 

described in EP 0424 047, U.S. Patent No. 5,322,783 (Pioneer Hi-Bred International) or in 
EP 0397 687, U.S. Patent No. 5,376,543 or U.S. Patent No. 5,169,770 (University Toledo). 
Soybean seeds are surfece sterilized with 70% ethanol for 4 minutes at room temperature 
with continuous shaking, followed by 20% (v/v) Clorox supplemented with 0.05% (v/v) 
Tween for 20 minutes with continuous shaking. Then the seeds are rinsed four times with 
distilled water and placed on moistened sterile filter paper in a Petri dish at room temperature 
for 6 to 39 hours. The seed coats are peeled off, and cotyledons are detached from the 
embryo axis. The embryo axis is examined to make sure that the meristematic region is not 
damaged. The excised embryo axes are collected in a half-open sterile Petri dish and air- 
dried to a moisture content less than 20% (ficesh weight) in a sealed Petri dish until further 
use. 

[0153] The method of plant transformation is also applicable to Brassica and other 

crops. In particular, seeds of canola are surfece sterilized with 70% ethanol for 4 minutes at 
room temperature with continuous shaking, followed by 20% (v/v) Clorox supplemented with 
0.05 % (v/v) Tween for 20 minutes, at room temperature with continuous shaking. Then, the 
seeds are rinsed 4 times with distilled water and placed on moistened sterile filter paper in a 
Petri dish at room temperature for 18 hours. The seed coats are removed and the seeds are air 
dried overnight in a half-open sterile Petri dish. During this period, the seeds lose 
approximately 85% of their water content The seeds are then stored at room temperature in 
a sealed Petri dish until further use. 

[0154] Agrobacterium tumefaciens culture is prepared from a single colony in LB 

solid medium plus appropriate antibiotics (e.g. 100 mg/1 streptomycin, 50 mg/1 kanamycin) 
followed by growth of the single colony in liquid LB medium to an optical density at 600 nm 
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of 0.8. Then, the bacteria culture is pelleted at 7000 rpm for 7 minutes at room temperature, 
and resuspended in MS (Murashige & Skoog, 1962, Physiol. Plant. 15:473-497) medium 
supplemented with 100 mM acetosyringone. Bacteria cultures are incubated in this pre- 
induction medium for 2 hours at room temperature before use. The axis of soybean zygotic 
seed embryos at approximately 44% moisture content are imbibed for 2 h at room 
temperature with the pre-induced Agrobacterium suspension culture. (The imbibition of dry 
embryos with a culture of Agrobacterium is also applicable to maize embryo axes). 
[0155] The embryos are removed from the imbibition culture and are transferred to 

Petri dishes containing solid MS medium supplemented with 2% sucrose and incubated for 2 
days, in the dark at room temperature. Alternatively, the embryos are placed on top of 
moistened (liquid MS medium) sterile filter paper in a Petri dish and incubated under the 
same conditions described above. After this period, the embryos are transferred to either 
solid or liquid MS medium supplemented wilh 500 mg/1 carbenicillin or 300 mgA cefotaxime 
to kill the agrobacteria. The liquid medium is used to moisten the sterile filter paper. The 
embryos are incubated during 4 weeks at 25°C, under 440 umol m-V* and 12 hours 
photoperiod. Once the seedlings have produced roots, they are transferred to sterile 
metromix soil. The medium of the in vitro plants is washed off before transferring the plants 
to soil. The plants are kept under a plastic cover for 1 week to favor the acclimatization 
process. Then the plants are transferred to a growth room where they are incubated at 25°C, 
under 440 \xmol nrV 1 light intensity and 12 h photoperiod for about 80 days. 
[0156] Samples of the primary transgenic plants (T 0 ) are analyzed by PCR to confirm 

the presence of T-DNA. These results are confirmed by Southern hybridization wherein 
DNA is electrophoresed on a 1% agarose gel and transferred to a positively charged nylon 
membrane (Roche Diagnostics). The PCR DIG Probe Syndesis Kit (Roche Diagnostics) is 
used to prepare a digoxigenin-labeled probe by PCR as recommended by the manufacturer. 

Example 12 

In vivo Mutagenesis . , . . , 

[0157] ^ vivo mutagenesis of microorganisms can be performed by incorporation ana 

passage of the plasmid (or other vector) DNA through E. coli or other microorganisms (e.g. 

Bacillus spp. or yeasts such as Saccharomyces cerevisiae) which are impaired in their 

capabilities to maintain me integrity of their genetic information. Typical mutator strains 

have mutations in the genes for the DNA repair system (e.g., mutHLS, mutD, mutT, etc.; for 

reference, see Rupp W.D. 1996, DNA repair mechanisms, in: Escherichia coli and 
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Salmonella, p. 2277-2294, ASM: Washington.) Such strains are well known to those skilled 
in the art. The use of such strains is illustrated, for example, in Greener and Callahan, 1994, 
Strategies 7:32-34. Transfer of mutated DNA molecules into plants is preferably done after 
selection and testing in microorganisms. Transgenic plants are generated according to various 
examples within the exemplification of this document 

Example 13 

Assessment of the mRNA Expression and Activity of a Recombinant Gene Product in the 

Transformed Organism „ 

[015g] The activity of a recombinant gene product in the transformed host organism 

can be measured on the transcriptional level or/and on the translational level. A useful 

method to ascertain the level of transcription of the gene (an indicator of the amount of 

mRNA available for translation to the gene product) is to perform a Northern blot (for 

reference see, for example, Ausubel et aL 1988, Current Protocols in Molecular Biology, 

Wiley: New York), in which a primer designed to bind to the gene of interest is labeled with 

a detectable tag (usually radioactive or chemuuminescenf), such that when the total RNA of a 

culture of the organism is extracted, run on gel, transferred to a stable matrix and incubated 

with this probe, the binding and quantity of binding of the probe indicates the presence and 

also the quantity of mRNA for this gene. This information at least partially demonstrates the 

degree of transcription of the transformed gene. Total cellular RNA can be prepared from 

plant cells, tissues or organs by several methods, all well-known in the art, such as that 

described in Bormann et al. (1992, Mol. Microbiol. 6:317-326). 

[0159] To assess the presence or relative quantity of protein translated from this 

mRNA, standard techniques, such as a Western blot, may be employed (See, for example, 
Ausubel et al. 1988, Current Protocols in Molecular Biology, Wiley: New York). In this 
process, total cellular proteins are extracted, separated by gel electrophoresis, transferred to a 
matrix such as nitrocellulose, and incubated with a probe, such as an antibody, which 
specifically binds to the desired protein. This probe is generally tagged with a 
chentiluminescent or colorimetric label which may be readily detected. The presence and 
quantity of label observed indicates the presence and quantity of the desired mutant protein 
present in the cell. 

[0160] The activity of LMPs that bind to DNA can be measured by several well- 

established methods, such as DNA band-shift assays (also called gel retardation assays). The 
effect of such LMP on the expression of other molecules can be measured using reporter gene 
assays (such as that described in Kolmar H. et al., 1995, EMBO J. 14:3895-3904 and 
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references cited tiierein). Reporter gene test systems are well known and established for 
applications in both prokaryotic and eukaryotic cells, using enzymes such as beta- 
galactosidase, green fluorescent protein, and several others. 

[0161] The determination of activity of lipid metabolism membrane-transport proteins 

can be performed according to techniques such as those described in Gennis R.B. (1989 
Pores, Channels and Transporters, in Biomembranes, Molecular Structure and Function, 
Springer: Heidelberg, pp. 85-137, 199-234 and 270-322). 

Example 14 

In vitro Analysis of the Function of Arabidopsis thaliana and Brassica napus Genes in 

Transgenic Plants _ . „ 

[0162] The determination of activities and kinetic parameters of enzymes is well 

established in the art. Experiments to determine the activity of any given altered enzyme 

must be tailored to me specific activity of the wild-type enzyme, which is well within the 

ability of one skilled in the art. Overviews about enzymes in general, as well as specific 

details concerning structure, kinetics, principles, methods, applications and examples for the 

determination of many enzyme activities may be found, for example, in the following 

references: Dixon, M. & Webb, E.C., 1979, Enzymes. Longmans: London; Fersht, 1985, 

Enzyme Structure and Mechanism. Freeman: New York; Walsh, 1979, Enzymatic Reaction 

Mechanisms. FreemamSan Francisco; Price, N.C., Stevens, L., 1982, Fundamentals of 

Enzymology. Oxford Univ. Press: Oxford; Boyer, P.D., ed. (1983) The Enzymes, 3rd ed. 

Academic Press: New York; Bisswanger, H., 1994, Enzymkinetik, 2nd ed. VCH:Weinheim 

(ISBN 3527300325); Bergmeyer, H.U., Bergmeyer, J., GraBl, M., eds. (1983-1986) Methods 

of Enzymatic Analysis, 3rd ed., vol. I-XH, Verlag Chemie: Weinheim; and Ullmann's 

Encyclopedia of Industrial Chemistry (1987) vol. A9, Enzymes. VCH:Weinheim, p. 352-363. 

Example 15 

Analysis of the Impact of Recombinant LMPs on the Production of a Desired Seed Storage 
Compound: Fatty Acid Production 

[0163] The total fatty acid content of Arabidopsis seeds was determined by 

saponification of seeds in 0.5 M KOH in methanol at 80°C for 2 hours followed by LC-MS 
analysis of the free fatty acids. Total fatty acid content of seeds of control and transgenic 
plants was measured with bulked seeds (usually 5 mg seed weight) of a single plant. Three 
different types of controls have been used: Col-2 (Columbia-2, the Arabidopsis ecotype in 
which SEQ ID NO:l, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9, SEQ ED 
NO:ll, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:17, SEQ ID NO:19, SEQ ID NO:21, 
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SEQ ID NO:23, SEQ ID NO:25, SEQ ID NO:27, SEQ ID NO:29, SEQ ID NO:31, SEQ ID 
NO:33, SEQ ID NO:79, or SEQ ID NO:81 has been transformed), Col-0 (Columbia-0, the 
Arabidopsis ecotype in which SEQ ID NO:35, SEQ ID NO:37, SEQ ID NO:39, SEQ ID 
NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ ID NO:51, SEQ ID NO:53, 
SEQ ID NO:55, SEQ ID NO:57, SEQ ID NO:59, SEQ ID NO:61, SEQ ID NO:63, SEQ:ID 
NO:65, SEQ ID NO:67, SEQ ID NO:69, SEQ ID NO:71 9 SEQ ID NO:73, SEQ ID NO:75, or 
SEQ ID NO:77 has been transformed), C-24 (an Arabidopsis ecotype found to accumulate 
high amounts of total fatty acids in seeds), and the BPS empty (without an LMP gene of 
interest) binary vector construct The controls indicated in the tables below have been grown 
side by side with the transgenic lines. Differences in the total values of the controls are 
explained either by differences in the growth conditions, which were found to be very 
sensitive to small variations in the plant cultivation, or by differences in the standards added 
to quantify the fatty acid content. Because of the seed bulking, all values obtained with T2 
seeds, and in part also wim T3 seeds, are the result of a mixture of homozygous (for the gene 
of interest) and heterozygous events, implying that these data underestimate the LMP gene 
effect. 

[01641 Table 5. Determination of the T2 seed total fatty acid content of transgenic lines of 
pkl23 (containing SEQ ID NO:l). Shown are the means (± standard deviation). (Average 
mean values are shown ± standard deviation, number of individual measurements per plant 
line: 12-20; Col-2 is the Arabidopsis ecotype Ihe LMP gene has been transformed in, C-24 is 
a high-oil Arabidopsis ecotype used as another control). 

Genotype g *" tal fattv adiWp seed weight 

C-24 wild-type control 0.318 ±0.022 

Col-2 wild-type control 0-300 ± 0.023 

Pkl23 transgenic seeds 0.3 19 ± 0.024 

[0165] Table 6. Determination of the T2 seed total fatty acid content of transgenic lines of 
pkl97 (containing SEQ ID NO:3). Shown are the means (± standard deviation) of 6 
individual plants per line. 

Genotype g tntal fattv acids/g seed weight 

C-24 wild-type control 0.371 ±0.010 

Col-2 wild-type control 0.353 ±0.017 

Col-2 empty vector control 0.347 ±0.024 

Pkl 97 transgenic seeds 0.366 ±0.014 
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[0166] Table 7. Determination of the T2 seed total fatly acid content of transgenic lines of 
pkl36 (containing SEQ ID NO:5). Shown are the means (± standard deviation) of 6 
individual plants per line. 

Genotype g total fatty a cids/g seed weight 

C-24 wild-type control 0-351 ± 0.052 

Col-2 wild-type control 0.344 ± 0.026 

Col-2 empty vector control 0.346 ± 0.019 

Pkl36 transgenic seeds 0.374 ± 0.033 

[0167] Table 8. Determination of the T2 seed total fatty acid content of transgenic lines of 
pkl56 (containing SEQ ID NO:7). Shown are the means (± standard deviation) of 6 
individual plants per line each. 

Genotype K total fattv acids/g seed weight 

C-24 wild-type control 0.400 ± 0.001 

Col-2 wild-type control 0.369 ±0.043 

Pkl 56 transgenic seeds 0.389 ±0.007 

[0168] Table 9. Determination of the T2 seed total fatty acid content of transgenic lines of 
pkl 59 (containing SEQ ID NO:9). Shown are the means (± standard deviation) of 6 

individual plants per line. 

Genotype g total fattv acids/g seed weight 

C-24 wild-type control 0.413 ± 0.019 

Col-2 wild-type control 0.38 1 ± 0.0 1 9 

Pkl 59 transgenic seeds 0.409 ± 0.008 

[0169] Table 10. Determination of the T2 seed total fatty acid content of transgenic lines of 
pkl79 (containing SEQ ID NO: 11). Shown are the means (± standard deviation) of 6 
individual plants per line. 

Genotype g total fattv acids/g seed weight 

C-24 wild-type control 0.400 ± 0.033 

Col-2 wild-type control 0.339 ±0.033 

Col-2 empty vector control 0.357 ± 0.021 

Pkl 79 transgenic seeds 0.3 84 ± 0.020 
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[0170] Table 11. Determination of the T2 seed total fatty acid content of transgenic lines of 
pk202 (containing SEQ ID NO:13). Shown axe the means (± standard deviation) of 6 
individual plants per line. 

Genotype g total fattv acids/g seed weight 

C-24 wild-type control 0.413 ± 0.019 

Col-2 wild-type control 0.381 ± 0.019 

Col-2 empty vector control 0.407 ± 0.020 

Pk202 transgenic seeds 0.426 ± 0.033 

[0171] Table 12, Determination of the T2 seed total fatty acid content of transgenic lines of 

pk206 (containing SEQ ID NO: 15). Shown are the means (± standard deviation) of 6 
individual plants per line. 

Genotype g total fattv acids/g seed weight 

C-24 wild-type control 0.422 ± 0.01 3 

Col-2 wild-type control 0.354 ± 0.026 

Col-2 empty vector control 0.388 ± 0.023 

Pk206 transgenic seeds 0.414 ± 0.03 1 

[0172] Table 13. Determination of the T2 seed total fatty acid content of transgenic lines of 
pk207 (containing SEQ ID NO:17). Shown are the means (± standard deviation) of 6 
individual plants per line. 

Genotype g total fattv acids/g seed weight 

C-24 wild-type control 0.371 ± 0.010 

Col-2 wild-type control 0.353 ± 0.017 

Col-2 empty vector control 0.347 ± 0.024 

Pk207 transgenic seeds 0.370 ± 0.009 

[01731 Table 14, Determination of the T2 seed total fatty acid content of transgenic lines of 
pk209 (containing SEQ ID NO:19). Shown are the means (± standard deviation) of 6 
individual plants per line. 

Genotype g total fattv acids/g see d weight 

C-24 wild-type control 0.400 ± 0.00 1 

Col-2 wild-type control 0.369 ± 0.043 

Pk209 transgenic seeds 0.397 ± 0.007 
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[0174] Table 15. Determination of the T2 seed total fatty acid content of transgenic lines of 
pk215 (containing SEQ ID NO:21). Shown are the means (± standard deviation) of 6 
individual plants per line. 

Genotype g total fatty acids/g seed weight 

C-24 wild-type control 0.373 ± 0.045 

Col-2 wild-type control 0.344 ± 0.026 

Col-2 empty vector control 0.346 ± 0.019 

Pk215 transgenic seeds 0.401 ± 0.014 

[0175] Table 16. Determination of the T3 seed total fatty acid content of transgenic lines of 
pk239 (containing SEQ ID NO:23). Shown are the means (± standard deviation) of 14-20 
individual plants per line. 

Genotype g total fattv acids/g seed weight 

C-24 wild-type control 0.334 ± 0.030 

Col-2 empty vector control 0.301 ± 0.027 

Pk239-2 transgenic seeds 0.335 ± 0.028 

Pk239-9 transgenic seeds 0.335 ± 0.018 

Pk239-1 8 transgenic seeds 0.331 ± 0.026 

Pk239-20 transgenic seeds 0.343 ± 0.022 

[0176] Table 17. Determination of the T3 seed total fatty acid content of transgenic lines of 
pk240 (containing SEQ ID NO:25). Shown are the means (± standard deviation) of 10-20 
individual plants per line. 

Genotype ff total fattv acids/g seed weight 

C-24 wild-type control 0.393 ± 0.037 

Col-2 empty vector control 0.342 ± 0.024 

Pk240-3 transgenic seeds 0.373 ± 0.033 

Pk240-6 transgenic seeds 0.388 ± 0.015 

Pk240-10 transgenic seeds 0.393 =b 0.025 
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[0177] Table 18. Determination of the T2 seed total fatty acid content of transgenic lines of 
pk241 (containing SEQ ID NO:27). Shown are the means (± standard deviation) of 10 
(controls) and 30 (pk241) individual plants per line, respectively. 

Genotype g total fattv a cids/g seed weight 

Col-2 wild-type control 0.3 12 ± 0.033 

Col-2 empty vector control 0.305 ± 0.025 

Pk241 transgenic seeds 0.336 ± 0.032 

[0178] Table 19. Determination of the T2 seed total fatty acid content of transgenic lines of 
Pk242 (containing SEQ ID NO:29). Shown are the means (± standard deviation) of 6 
individual plants per line. 

Genotype g total fattv acids/g seed weight 

Col-2 wild-type control 0.344 ± 0.016 

Col-2 empty vector control 0.333 ± 0.040 

Pk242 transgenic seeds 0.364 ± 0.008 

[0179] Table 20. Determination of the T2 seed total fatty acid content of transgenic lines of 
BnOll (containing SEQ ID NO:31). Shown are the means (± standard deviation) of 14-20 
individual plants per line. 

Genotype g total fattv acids/g seed weight 

C-24 wild-type control 0.334 ± 0.028 

Col-2 wild-type control 0.286 ± 0.039 

Col-2 empty vector control 0.291 ± 0.034 

BnOl 1 transgenic seeds 0.308 ± 0.030 

[0180] Table 21. Determination of the T2 seed total fatty acid content of transgenic lines of 
Bn077 (containing SEQ ID NO:33). Shown are the means (± standard deviation) of 8-17 
individual plants per line. 

Genotype g total fattv acids/g see d weight 

C-24 wild-type control 0.366 ± 0.056 

Col-2 wild-type control 0.290 ±0.047 

Col-2 empty vector control 0.292 ± 0.038 

Bn077 transgenic seeds 0.314 ±0.032 
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[0181] Table 22. Determination of the T2 seed total fatty acid content of transgenic lines of 
JbOOl (containing SEQ ID NO:35). Shown are the means (± standard deviation) of 3 
individual control plants and 2 individual plants per line. 

Genotype g total fatt y acids/g seed weight 

Col-0 empty vector control 0.241 ± 0.012 

JbOOl transgenic seeds 0.274 ± 0.003 

[0182] Table 23. Determination of the T2 seed total fatty acid content of transgenic lines of 
Jb002 (containing SEQ ID NO:37). Shown are the means (± standard deviation) of 3 
individual control plants and 5 individual plants per line. 

Genotype g total fattv acids/g seed weight 

Col-0 empty vector control 0.191 ± 0.044 

Jb002 transgenic seeds 0.273 ± 0.020 

[0183] Table 24. Determination of the T2 seed total fatty acid content of transgenic lines of 
Jb003 (containing SEQ ID NO:39). Shown are the means (± standard deviation) of 3 
individual control plants and 2 individual plants per line. 

Genotype g total fattv acids/g seed weight 

Col-0 empty vector control 0.267 ± 0.01 1 

Jb003 transgenic seeds 0.297 ± 0.030 

[0184] Table 25. Determination of the T2 seed total fatty acid content of transgenic lines of 
Jb005 (containing SEQ ID NO:41). Shown are the means (± standard deviation) of 3 
individual control plants and 7 individual plants per line. 

Genotype g total fattv acids/g seed weight 

Col-0 empty vector control 0.229 ± 0.021 

Jb005 transgenic seeds 0.264 ± 0.010 

[0185] Table 26. Determination of the T2 seed total fatty acid content of transgenic lines of 
Jb007 (containing SEQ ID NO:43). Shown are the means (± standard deviation) of 3 
individual control plants and 5 individual plants per line. 

Genotype g total fattv acids/g seed weight 

Col-0 empty vector control 0.296 ±0.017 

Jb007 transgenic seeds 0.320 ± 0.002 
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(01861 Table 27. Determination of Ihe T2 seed total fatty acid content of transgenic lines of 
Jb009 (containing SEQ ID NO:45). Shown are the means (± standard deviation) of 3 
individual control plants and 3 individual plants per line. 

Genotype S total fatty acids/g seed weight 

Col-0 empty vector control 0.227 ± 0.016 

Jb009 transgenic seeds 0.238 ± 0.004 

[0187] Table 28. Determination of the T2 seed total fatty acid content of transgenic lines of 
Jb013 (containing SEQ ID NO:47). Shown are the means (± standard deviation) of 3 
individual control plants and 4 individual plants per line. 

Genotype g *" tal fattv acids/g seed weight 

Col-0 empty vector control 0.243 ± 0.0 1 1 

JbOl 3 transgenic seeds 0.262 ± 0.007 

[0188] Table 29. Determination of the T2 seed total fatty acid content of transgenic lines of 
Jb017 (containing SEQ ID NO:51). Shown are the means (± standard deviation) of 3 
individual control plants and 2 individual plants per line. 

Genotype g fattv acids/g seed weight 

Col-0 empty vector control 0.23 1 ± 0.020 

Jb017 transgenic seeds 0.269 ± 0.022 

[0189] Table 30. Determination of the T2 seed total fatty acid content of transgenic lines of 
Jb027 (containing SEQ ID NO:55). Shown are the means (± standard deviation) of 3 
individual control plants and 2 individual plants per line. 

Genotype g *ntal fattv acids/g seed weight 

Col-0 empty vector control 0.235 ± 0.052 

Jb027 transgenic seeds 0.282 ± 0.014 
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[0190] Table 31. Detennination of the T2 seed total fatty acid content of transgenic lines of 
OO-l (containing SEQ ID NO:57). Shown are the means (± standard deviation) of 3 
individual control plants and 7 individual plants per line. 

Genotype g totaI fattv adds/g sced weight 

Col-0 empty vector control 0.250 ± 0.009 

OO-l transgenic seeds 0.274 ± 0.007 

[0191] Table 32. Determination of the T2 seed total fatty acid content of transgenic lines of 
00-4 (containing SEQ ID NO:63). Shown are the means (± standard deviation) of 2 
individual control plants and 4 individual plants per line. 

Genotype «* total fattv a cids/g seed weight 

Col-0 empty vector control 0.329 ± 0.041 

00-4 transgenic seeds 0.3 80 ± 0.0 1 5 

[0192] Table 33. Determination of the T2 seed total fatty acid content of transgenic lines of 
00-8 (containing SEQ ID NO:69). Shown are the means (± standard deviation) of 4 
individual control plants and 2 individual plants per line. 

Genotype g total fattv acids/g s eed weight 

Col-0 empty vector control 0.379 ± 0.009 

00-8 transgenic seeds 0.41 1 ± 0.008 

[0193] Table 34. Determination of the T2 seed total fatty acid content of transgenic lines of 
00-9 (containing SEQ ID NO:71). Shown are the means (± standard deviation) of 3 
individual control plants and 4 individual plants per line. 

Genotype ft total fattv a cids/g seed weight 

Col-0 empty vector control 0.3 1 5 ± 0.020 

00-9 transgenic seeds 0-333 ± 0.006 
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[0194] Table 35. Determination of the T2 seed total fatty acid content of transgenic lines of 
OO-ll (containing SEQ ID NO:75). Shown are the means (± standard deviation) of 3 
individual control plants and 2 individual plants per line. 

Genotype g fatty acids/g seed weight 

Col-0 empty vector control 0.264 ± 0.003 

OO-ll transgenic seeds 0.278 ±0.003 

10195] Table 36. Determination of Ihe T2 seed total fatty acid content of transgenic lines of 
00-12 (containing SEQ ID NO:77). Shown are the means (± standard deviation) of 3 
individual control plants and 9 individual plants per line. 

Genotype g total fattv acids/g seed weight 

Col-0 empty vector control 0.290 ± 0.010 

00-12 transgenic seeds 0.3 1 6 ± 0.008 

[0196] Table 37. Determination of the T4 seed total fatty acid content of transgenic lines of 
pp82 (containing SEQ ID NO:79). Shown are the means (± standard deviation) of 17-20 
individual plants per line. 

Genotype g total fattv acids/g seed weight 

C-24 wild-type control 0.436 ± 0.050 

Col-2 wild-type control 0.380 ± 0.020 

Col-2 empty vector control 0.378 ± 0.030 

pp82-15-16 transgenic seeds 0.432 ±0.040 

pp82-15-19 transgenic seeds 0.437 ±0.040 

pp82-16-10 transgenic seeds 0.430 ± 0.040 

pp82-9-14 transgenic seeds 0.449 ± 0.040 
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[0197] Table 38. Determination of the T4 seed total fatty acid content of transgenic lines of 
pk225 (containing SEQ ID NO:81). This particular gene has been down-regulated. Shown are 
the means (± standard deviation) of 17-20 individual plants per line. 

Genotype g *»**\ fattv acids/g seed weight 

C-24 wild-type control 0.344 ±0.048 

Col-2 empty vector control 0.327 ± 0.03 1 

Pk225-ll-19 transgenic seeds 0.350 ±0.041 

Pk225-19-8 transgenic seeds 0.351 ±0.021 

Pk225-7-6 transgenic seeds 0.354 ±0.037 

Pk225-9-10 transgenic seeds 0.363 ±0.042 

Table 39. Determination of the T2 seed total fatty acid content of transgenic lines of 00-3 
(containing SEQ ID NO:61). Shown are the means (± standard deviation) of 4 individual 
control plants and 6 individual plants per line. 

Genotype g total fattv arids/g seed weight 

Col-0 empty vector control 0.365 ± 0.006 

00-3 transgenic seeds 0.388 ± 0.006 

Example 16 

Analysis of the Impact of Recombinant Proteifis on the Production of a Desired Seed Storage 
Compound ^ effect of me genetic modification in plants on a desired seed storage 
compound (such as a sugar, lipid or fatty acid) can be assessed by growing the modified plant 
under suitable conditions and analyzing the seeds or any other plant organ for increased 
production of the desired product (i.e., a lipid or a fatty acid). Such analysis techniques are 
well known to one skilled in the art, and include spectroscopy, thin layer chromatography, 
staining methods of various kinds, enzymatic and microbiological methods, and analytical 
chromatography such as high performance liquid chromatography (See, for example, UUman, 
1985, Encyclopedia of Industrial Chemistry, vol. A2, pp. 89-90 and 443-613, VCH: 
Weinheim; Fallon, A. et al., 1987, Applications of HPLC in Biochemistoy in: Laboratory 
Techniques in Biochemistry and Molecular Biology, vol. 17; Rehm et al., 1993, Product 
recovery and purification, Biotechnology, vol. 3, Chapter m, pp. 469-714, VCH: Weinheim; 
Belter, P.A. et al., 1988, Bioseparations: downstream processing for biotechnology, John 
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Wiley & Sons; Kennedy J.F. & Cabral J.M.S., 1992, Recovery processes for biological 
materials, John Wiley and Sons; Shaeiwitz J.A. & Henry J.D., 1988, Biochemical separations 
in: Ulmann's Encyclopedia of Industrial Chemistry, Separation and purification techniques in 
biotechnology, vol. B3, Chapter 1 1, pp. 1-27, VCH: Weinheim; and Dechow F.J. 1989). 
[0199] Besides the above-mentioned methods, plant lipids are extracted from plant 

material as described by Cahoon et al. (1999, Proc. Natl. Acad. Sci. USA 96, 22:12935- 
12940) and Browse et al. (1986, Anal. Biochemistry 442:141-145). Qualitative and 
quantitative lipid or fatty acid analysis is described in Christie, William W., Advances in 
Lipid Methodology. Ayr/Scotland -.Oily Press. - (Oily Press Lipid Library; Christie, William 
W., Gas Chromatography and Lipids. A Practical Guide - Ayr, Scotland :Oily Press, 1989 
Repr. 1992. - DC,307 S. - (Oily Press Lipid Library; and "Progress in Lipid Research, Oxford 
:Pergamon Press, 1 (1952) - 16 (1977) Progress in the Chemistry of Fats and Other Lipids 
CODEN. 

[0200] Unequivocal proof of the presence of fatty acid products can be obtained by 

the analysis of transgenic plants following standard analytical procedures: GC, GC-MS or 
TLC as variously described by Christie and references therein (1997 in: Advances on Lipid 
Methodology 4th ed.: Christie, Oily Press, Dundee, pp. 119-169; 1998). Detailed methods are 
described for leaves by Lemieux et al. (1990, Theor. Appl. Genet. 80:234-240) and for seeds 
by Focks & Benning (1998, Plant Physiol. 1 18:91-101). 

[0201] Positional analysis of the fatty acid composition at the C-l, C-2 or C-3 

positions of the glycerol backbone is determined by lipase digestion (See, e.g., Siebertz & 
Heinz 1977, Z. Naturforsch. 32c:193-205, and Christie, 1987, Lipid Analysis 2P* Edition, 
Pergamon Press, Exeter, ISBN 0-08-023791-6). 

[0202] A typical way to gather information regarding the influence of increased or 

decreased protein activities on lipid and sugar biosynthetic palhways is for example via 
analyzing the carbon fluxes by labeling studies with leaves or seeds using 14c-acetate or 
14c-pyruvate (See, e.g. Focks & Benning, 1998, Plant Physiol. 118:91-101; Eccleston & 
Ohlrogge, 1998, Plant Cell 10:613-621). The distribution of carbon-14 into lipids and 
aqueous soluble components can be determined by liquid scintillation counting after the 
respective separation (for example on TLC plates) including standards like 14C-sucrose and 
14c-malate (Eccleston & Ohlrogge, 1998, Plant Cell 10:613-621). 

[0203] Material to be analyzed can be disintegrated via Bonification, glass milling, 

liquid nitrogen and grinding, or via other applicable methods. The material has to be 



75 



WO 2004/013304 

T/US2003/024364 

centrifuged after disintegration. The sediment is resuspended in distUled water, heated for 10 
minutes at 100°C, cooled on ice and centrifuged again, followed hy extraction in 0.5 M 
sulfuric acid in methanol containing 2% dimethoxypropane for 1 hour at 90°C, leading to 
hydrolyzed oil and lipid compounds resulting in transmethylated lipids. These fatty acid 
methyl esters are extracted in petrolether and finally subjected to GC analysis using a 
capillary column (Chrompack, WCOT Fused Silica, CP-Wax-52 CB, 25 m, 0.32 mm) at a 
temperature gradient between 170°C and 240°C for 20 minutes and 5 minutes at 240°C. The 
identity of resulting fatty acid methylesters is defined by the use of standards available form 
commercial sources (e.g., Sigma). 

[0204] In the case of fatty acids where standards are not available, molecule identity is 

shown via derivatization and subsequent GC-MS analysis. For example, the localization of 
triple bond fatty acids is shown via GC-MS after derivatization via 4,4-Dimethoxy-oxazolin- 
Derivaten (Christie, Oily Press, Dundee, 1998). 

[0205] A common standard method for analyzing sugars, especially starch, is 

published by Stitt M., Lilley R.Mc.C, Gerhardt R. and Heldt M.W. (1989, "Determination of 
metabolite levels in specific cells and subcellular compartments of plant leaves," Methods 
Enzymol. 174:518-552; for other methods, see also Hartel et al., 1998, Plant Physiol. 
Biochem. 36:407-417 and Focks & Beiming, 1998, Plant Physiol. 1 18:91-101). 
[0206] For the extraction of soluble sugars and starch, 50 seeds are homogenized in 

500 pi of 80% (v/v) ethanol in a 1.5-ml polypropylene test tube and incubated at 70°C for 90 
minutes. Following centrifugation at 16,000 g for 5 minutes, the supernatant is transferred to 
a new test tube. The pellet is extracted twice with 500 pi of 80% ethanol. The solvent of the 
combined supernatants is evaporated at room temperature under a vacuum. The residue is 
dissolved in 50 pi of water, representing the soluble carbohydrate fraction. The pellet left 
from the ethanol extraction, which contains the insoluble carbohydrates including starch, is 
homogenized in 200 pi of 0.2 N KOH, and the suspension is incubated at 95°C for 1 hour to 
dissolve the starch. Following the addition of 35 pi of 1 N acetic acid and centrifugation for 
5 minutes at 16,000 g, the supernatant is used for starch quantification. 
[0207] To quantify soluble sugars, 10 pi of the sugar extract is added to 990 pi of 

reaction buffer containing 100 mM imidazole, pH 6.9, 5 mM MgCl 2 , 2 mM NADP, 1 mM 
ATP, and 2 units 2 ml" 1 of Glucose-6-P-dehydrogenase. For enzymatic determination of 
glucose, fructose, and sucrose, 4.5 units of hexokinase, 1 unit of phosphoglucoisomerase, and 
2 pi of a saturated fructosidase solution are added in succession. The production of NADPH 
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is photometrically monitored at a wavelength of 340 nm. Similarly, starch is assayed in 30 ^il 
of the insoluble carbohydrate fraction with a kit from Boehringer Mannheim. 
[0208] An example for analyzing the protein content in leaves and seeds can be found 

by Bradford M.M. (1976, "A rapid and sensitive method for the quantification of microgram 
quantities of protein using the principle of protein dye binding," Anal. Biochem. 72:248-254). 
For quantification of total seed protein, 15-20 seeds are homogenized in 250 ul of acetone in 
a 1.5-ml polypropylene test tube. Following centrifugation at 16,000 g, the supernatant is 
discarded and the vacuum-dried pellet is resuspended in 250 p.1 of extraction buffer 
containing 50 mM Tris-HCl, pH 8.0, 250 mM NaCl, 1 mM EDTA, and 1% (w/v) SDS. 
Following incubation for 2 h at 25°C, the homogenate is centrifuged at 16,000 g for 5 min 
and 200 ml of the supernatant will be used for protein measurements. In the assay, y-globulin 
is used for calibration. For protein measurements, Lowry DC protein assay (Bio-Rad) or 
Bradford-assay (Bio-Rad) are used. 

[0209] Enzymatic assays of hexokinase and fructokinase are performed spectropho- 

tometrically according to Renz et al. (1993, Planta 190:156-165); enzymatic assays of 
phosphogluco-isomerase, ATP-dependent 6-phosphofructokinase, pyrophosphate-dependent 
6-phospho-fructokinase, Fructose-l,6-bisphosphate aldolase, triose phosphate isomerase, 
glyceral-3-P dehydrogenase, phosphoglycerate kinase, phosphoglycerate mutase, enolase and 
pyruvate kinase are performed according to Burrell et al. (1994, Planta 194:95-101); and 
enzymatic assays of UDP-Glucose-pyrophosphorylase according to Zrenner et al. (1995, 
Plant J. 7:97-107). 

[0210] Intermediates of the carbohydrate metabolism, like Glucose-l-phosphate, 

Glucose-6-phosphate, Fructose-6-phosphate, Phosphoenolpyruvate, Pyruvate, and ATP are 
measured as described in Hartel et al. (1998, Plant Physiol. Biochem. 36:407-417), and 
metabolites are measured as described in Jelitto et al. (1992, Planta 188:238-244). 
[0211] In addition to the measurement of the final seed storage compound (i.e., lipid, 

starch or storage protein), it is also possible to analyze other components of the metabolic 
pathways utilized for the production of a desired seed storage compound, such as 
intermediates and side-products, to determine the overall efficiency of production of the 
compound (Fiehn et al., 2000, Nature Biotech. 18:1447-1161). 

[0212] For example, yeast expression vectors comprising the nucleic acids disclosed 

herein, or fragments thereof, can be constructed and transformed into Saccharomyces 
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cerevisiae using standard protocols. The resulting transgenic cells can then be assayed for 
alterations in sugar, oil, lipid, or fatty acid contents. 

[0213] Similarly, plant expression vectors comprising the nucleic acids disclosed 

herein, or fragments thereof, can be constructed and transformed into an appropriate plant 
cell such as Arabidopsis, soybean, rape, maize, wheat, Medicago truncatula, etc., using 
standard protocols. The resulting transgenic cells and/or plants derived therefrom can then be 
assayed for alterations in sugar, oil, lipid, or fatty acid contents. 

[0214] Additionally, the sequences disclosed herein, or fragments thereof, can be used 

to generate knockout mutations in the genomes of various organisms, such as bacteria, 
mammalian cells, yeast cells, and plant cells (Girke at al., 1998, Plant J. 15:39-48). The 
resultant knockout cells can then be evaluated for their composition and content in seed 
storage compounds, and the effect on the phenotype and/or genotype of the mutation. For 
other methods of gene inactivation include US 6,004,804 and Puttaraju et al., 1999, Nature 
Biotech. 17:246-252). 



Example 17 

Purification of the Desired Product from Transformed Organisms 

[0215] An IMP can be recovered from plant material by various methods well known 

in the art Organs of plants can be separated mechanically from other tissue or organs prior to 
isolation of the seed storage compound from the plant organ. Following homogenization of 
the tissue, cellular debris is removed by centrifugation and the supernatant fraction containing 
the soluble proteins is retained for further purification of the desired compound. If the 
product is secreted from cells grown in culture, then the cells are removed from the culture by 
low-speed centrifugation, and the supernate fraction is retained for further purification. 
[0216] The supernatant fraction from either purification method is subjected to 

chromatography with a suitable resin, in which the desired molecule is either retained on a 
chromatography resin while many of the impurities in the sample are not, or where the 
impurities are retained by the resin while the sample is not Such chromatography steps may 
be repeated as necessary, using the same or different chromatography resins. One skilled in 
the art would be well-versed in the selection of appropriate chromatography resins and in 
their most efficacious application for a particular molecule to be purified. The purified 
product may be concentrated by filtration or ultrafiltration, and stored at a temperature at 
which the stability of the product is maximized. 
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[02171 There are a wide array of purification methods known to the art and the 

preceding method of purification is not meant to be limiting. Such purification techniques 
are described, for example, in Bailey I.E. & Ollis D.F., 1986, Biochemical Engineering 
Fundamentals, McGraw-HilhNew York. 

[0218] The identity and purity of the isolated compounds may he assessed by 

techniques standard in the art These include high-performance liquid chromatography 
(HPLC), spectroscopic methods, staining methods, thin layer chromatography, analytical 
chromatography such as high performance liquid chromatography, MRS, enzymatic assay, or 
microbiologically. Such analysis methods are reviewed in: Patek et al. (1994, Appl. Environ. 
Microbiol. 60:133-140), Malakhova et al. (1996, Biotekhnologiya 11:27-32), Schmidt et al. 
(1998, Bioprocess Engineer 19:67-70), Ulmann's Encyclopedia of Industrial Chemistry 
(1996, Vol. A27, VCH: Weinheim, p. 89-90, p. 521-540, p. 540-547, p. 559-566, 575-581 
and p. 581-587) and Michal G. (1999, Biochemical Pathways: An Atlas of Biochemistry and 
Molecular Biology, John Wiley and Sons; Fallon, A. et al. 1987, Applications of HPLC in 
Biochemistry in: Laboratory Techniques in Biochemistry and Molecular Biology, vol. 17). 

Example 18 

Screening for increased stress tolerance and plant growth 

[0219] The transgenic plants are screened for their improved stress tolerance 

demonstrating that transgene expression confers stress tolerance. The transgenic plants are 
forther screened for their growth rate demonstrating that txansgene expression confers 
increased growth rates and/or increased seed yield. 

[0220] Classification of the proteins was done by Blasting against the BLOCKS 

database (S. Henikoff & J. G. Henikoff, "Protein family classification based on searching a 
database of blocks", Genomics 19:97-107 (1994)). 

[0221] Those skilled in the art will recognize, or will be able to ascertain using no 

more than routine experimentation, many equivalents to the specific embodiments of the 
invention described herein. Such equivalents are intended to be encompasses by the claims 
to the invention disclosed and claimed herein. 
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Appendix A 

SEQ ID NO:l, Nucleotide sequence of the open r ^ in ^ameofPjd23 
ATGGCAATCTTCCGAAGTACACTAGTTTTACTGCTGATCCTCTTCTGCCTCACCAC 

aaaTtoatocgotg^gagatgca^ 

AATCroTGT^GAGAGCATGCAACAGCTGTTGTTACCGCTGCAACTGTGTGCCAC 
CAGGCAC^^ 

TGGTGGCCGTCTCAAGTGCCCTTAA 

SEQ ID N0:2, Deduced amino acid sequence of the open r ^^S frame ofPkl23 
MAffRSTLV^LLILFCLTTFELHVHAAEDSQVGEGVVKIDCGGRCKGRCSKSSRPNLC 

LRACNSCCYRCNCVPPGTAGNHHLCPCYASITTRGGRLKCP 

SEQ ID NO:3, Nucleotide sequence of the open reading frame f™ 97 A .-™ r A ^ T 
ATGGAGAATGGAGCAACGACGACGAGCACAATTACCATCAAAGGGATTCTGAGT 

TTGCTAATGGAAAGCATCACAACAGAGGAAGATGAAGGAGGAAAGAGAGTAAT 

ATCTCTGGGAATGGGAGACCCAACACTCTACTCGTGTTTTCGTACAACACAAGTC 

TCTCTTCAAGCTGTTTCTGATTCTCTTCTCTCCAACAA 

ACCGTCGOTOTCCCCAAGCTCGAAGGGCAATA^^ 

TTCCATACAAACTTTCACAGGATGATGTGTTTATCACA 

SItS^a^gtcgatgttagct^ 

AGGCCTGGTTTCCCAATCTATGAACTCTGTGCTAAGTTTAGACACCTTGAAGTTC 
GCTACGTrcGATCTTCTTCCGGAAAATGGATGGGAGATCGATCTTGATGCTGTCGA. 

GGCTCTTGCAGACG^ 

TGCGGGAATGTCTATAGCTACCAGCATTTGATGAAGATTGCGGAATCGGCGAAA 
AAACTAGGGTTTCTTGTGATTGCTGATGAGGTTTACGGTCATCTTGCT'^^ 

CAAACCGmCT^ 

ScTCm^AAAGAGATGGATAGTTCCAGGTTGGCGACTCGGGTGG™ 
CACTGATCCTTCTGGTTCCTTTAAGGACCCTAAGATCATTGAGAGGTTTAAGAAA 

TACTTTOATA^CITGGTGGACCAGCTACATTTATTCAGGCT 

mGGAACAGACGGATGAGT^ 

TCTOG^ATAmOTGWTGGATCAAGGAGATTCCTTGCATTGATO^ 

TCGACCAGAAGGATCCATGGCAATGATGGTTAAGCTGAATCTCTCATTACTTGAA 

GATGTAAGTGACGATATCGACTTCTGTTTCAAGTTAGCTAGGGAAGAATCAGTCA 

TCCTTCTrCCTGGGACCGCGGTGGGGCTGAAGAACTGGCTGAGGATAACGTTTGC 

AGCAGATGCAACTTCGATTGAAGAAGCTTTTAAAAGGATCAAATGTTTCTATCTT 

AGACATGCCAAGACTCAATATCCAACCATATAG 

SEQ ID NO:4, Deduced amino acid sequence of the open reading frame ofPkl97 
MENGATTTSTITIKGILSLLMESITTEEDEGGKRVISLGMGDPTLYSCFRTT 

SDSLLSNKFHGYSPTVGLPQARRAIAEYLSRDL^ 

LARPRAMLLPRPGFPIYELCAKFRHLEVRYVDLLPENGWEIDLDAVEALADENTVAL 

VVINPGNPCGNWSYQHLMKIAESAKKLGFLVIADEWGHLAFGSKPFWMGVFGSI 

WVLTLGSLSKRWIVPGWRLGWVTTDPSGSFKDPKIIERFKKYFDILGGPATFIQAA^ 

VPTILEQTDESFFKKTLNSLKNSSDICCDWIKEIPCro 

DVSDDmFCFKLAREESVILLPGTAVGLKNWI^FAADATSIEEAFKRIKCFYLRHAK 
TQYPTI 
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SEQ ID NO: 5, Nucleotide sequence of the open r ^ d ^SframeqfPkl36 
AT^arTGAAAAAGTAAAGTCTGGTCAAGTTTTTAACCTATTATGCATATTCTCGA 

GAGCGGTGCCATCTGAAGATAAAACGACGACTGTTTGGCT 
GGTCGGGTAAAA^ 

AGCCGGTGA^TATGAAAGAGGCGGC 

CGACGGTGG^ 

A^GCTGTGAAAGAA 

CGGAGGGATCTGAGGAGCTATAA 

SEO /D Deduced amino acid sequence of the open reading frame ofPkl36 

MAEK\^SGQWNLLC 

H^^VMTLDRGQSHFFPPNTYFTGKNDAPMGAGENMKEAATRSFEHSKATVE 
EAARSAAEWSDTAEAVKEKVKRSVSGGVTQPSEGSEEL 

SEQ ID NO:7, Nucleotide sequence of the open reading frame ofPkl56 
ATGGCTGGAGAAGAAATAGAGAGGGAGAAGAAATCTGCAGCATCTGCAAGAAC 

tScaccaga^caacactcaacaaagto 

CTCCTGGTAACGTTCGTCGGAGTTTTAGCATGGGTTTATCAAACAATCCAACCAC 

c£Xc?ga^c^^^ 

CGAAGCCAAGTTCAAGATCATAAACATCCACGGCTT 
'TCGCATTTTC^ 

Sctg^c^g^ 
IggSS^S 

A^^?^CACGT^ 

CAA^CGGGCATACATAAGACAACAAGGTGAATATGTAAGCTTACACCGAGA^ 
TGAATGTCGCArVrTCAAGCTGGGAGTTTGATC^ 

g C c£g^^ 

GAAATATCTGGATCAGGACA 

TCATCAAGTCACTTTTGGTTGGGGAAGAAGATGTAAGTGAGAGTAGAGAAGCCT 
CTGTTTAA 

SEQ ID NO: 8, Deduced amino acid sequence of the open reading frame of Pkl 56 

MAGEEIEREKKSAASARTICTRNNTQQSS 

KIVGSPGGPTWSPPJKLRDGRHLAYTEFGIPPJDEAKFKIIM 

PALVEELRIYWSFDRPGYGESDPNLNGSPRSIALDIEELADGIX3LGPQFYLFGYSMGG 
EITWACLNYIPHRLAGAAiVAPAINYWWRNLPGDLTREAFSLMHPA 
YAPWLTYWWTQKWFPISNVIAGNPIffSRQDMEILSKXGFVNPNRAYmQQGEYVS 
LHRDLNV AF S S WEFDPLDLQDPFPNNNG S VHVWNGDEDKF VP VKLQRYV ASKXP WI 
RYHEISGSGHFVPFVEGMTDKIIKSLLVGEEDVSESREASV 
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SEO ID NO:9, Nucleotide sequence of the open reading frame ofPkl59 rwm ^ rr .^ r ,^ i - t 
AT^TGGAGTGATGAAOT^ 

GTCCAATCACAGCGAACGCGCTTATGAGTTGTGGCACCGTCAACGGCAACCTGG 

CAGGGTGCATTGCCTACTTGACCCGAGGTGCTCCACTTACCCAAGGGTGCTGCAA 

CGGCGTTACTAACCTTAAAAACATGGCCAGTACAACCCCAGACCGTCAGCAAGC 

^CG^GCClTCAATCTGCCGCTAAAGCCGTrGGTCCCGGTCTCAACACTGCCC 

GTGCAGCTGGACTTCCTAGCGCATGCAAAGTCAATATTCCTTACAAAATCAGCGC 

CAGCACCAACTGCAACACCGTGAGGTGA 

SEO ID NO: 10, Deduced amino acid sequence of the open r ^ dm Sfrarne^Pkl59 

MAGVMKLACMVLACMIVAGPITANALMSCGTVNGNLAGCIAYLTOGAPLTQGCC^ 

GVT^KNMASTTPDRQQAC^CLQSAAKAVGPGLNTARAAGIJ'SACKVNIPYKISAS 

TNCNTVR 

?vn m NO- 11 Nucleotide seauence of the open reading frame ofPkl 79 
AjixK^CTTO 

TGTTGTCCTTrGCTGCTTTrCCAGTCGAGATTCCTGGAGAGGTAGTATTTCTTCAT 
CCCGTTCACAACTATGCTCTGATTGCGTATAATCCATCAGCAATGGATCCTGCCA 
GTGCTTCAGTCATTCGTGCAGCTGAGCTACTACCTGAACCTGCACTCCAACGTGG 
AGATTCAGTCTATCTTGTCGGATTGAGTAGGAACCTTCAAGCTACATCAAGAAAA 

TCTA^OTAACCAATCCATGTGCAGCGTTAAACATTGGTTCTG^^ 

^ACAGAGCTACTAATATGGAAGTAATTGAGCTTGATACAGATTTTGGTAGCTCA 

TTTTCAGGGGCGCTGACTGATGAGCAGGGAAGAATTCGGGCTATTTGGGKjAAGT 

TTTTCGACTCAGGTTAAATATAGTTCCACnTCTTCAGAAGACCACCAGTTTGTCA 

AGGTATCCCAGTATATGCAATCAGCCAAGTCCTTGAAAAAATCATAACCGGTGG 

AAATGGACCAGCTCTTCTCATAAATGGTGTCAAAAGGCCAATGCCACTOTTCGG 

ATTTTGGAAGTTGAATTGTATCCTACTTTGCTTTCAAAAGCCCGGAGTTTTGGTCT 

GAGTGATGAATGGATCCAAGTCCTAGTCAAGAAGGATCCTGTTA^ 

TCTGCGTGTTAAAGGTTGCCTGGCAGGATCAAAAGCTGAAAACCTTCTTGAACAA 

GGCGATATGGTTCTGGCAGTCAATAAGATGCCAGTTACATGCTTCAATGACATAG 

AAGCTGCTTGCCAAACATTGGATAAGGGTAGTTACAGCGATGAAAATCTCAATCT 

AACAATCCTTAGACAGGGCCAAGAACTGGAGCTCGTAGTTGGAACTGATAAGAG 

AGATGGGAATGGAACGACAAGAGTGATAAATTGGTGCGGATGCGTTGTTCAGGA 

TCCTCATCCT 

ATGTCACAAGATGGTGTCACGGGAGTCCCGCTCACCGATATGGCCTCTACGCGCT 

TCAATGGATCGTGGAAGTTAATGGGAAGAAGACTCCTGACCTAAACGCATTCGC 

AGATGCTACCAAGGAGCTAGAACACGGGCAGTTTGTGCGTATTAGGACTGTTCAT 

CTAAACGK3CAAGCCACGAGTATTGACCCTGAAACAAGATCTCCATTACTGGCCG 

ACITGGGAATTGAGGTTCGACCCAGAGACTGCTCTTTGGCGGAGAAATATATTGA 

AAGCCTTGCAGTAA 

SEO ID NO: 12, Deduced amino acid sequence of the open reading frame ofPkl 79 
MGLAVVDKNTVAISASDVMLSFAAFPVEIPGEVWLHPVHNYALIAYNPSAMDPASA 

SVIRAAELLPEPALQRGDSVYLVGLSRNLQATSRKSrVTNPCAALNIGSADS 

NMEVlELDTDFGSSFSGALTDEQGRIRAr^GSFSTQVKYSSTSSEDHQFVRGIPVYAIS 

OVLEKnTGGNGPALLINGVKRPMPLVPJLEVELYPTLLSKARSFGLSDEWIQVLVKK 

DPVRROVLRVKGCLAGSKAENLLEQGDMAO.AVNKMPVTCFNDffiAACQTLDKGSY 

SDENLNLTILRQGQELELWGTDKRDGNGTTOVINWCGCVVQDPHPAVRALGFLPE 

EGHGVYVTRWCHGSPAHRYGLYALQWTVEVNGKKTPDLNAFADATKELEHGQFVR 

mTVHLNGKPRVLTLKQDLHYWPTWELRFDPETALWRRMLKALQ 
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SEQ ID NO: 13, Nucleotide sequence of the open re ^ m SframeofP^02 
ATCGCGTTCACGGCGCTTGTGTTCATTGTGTTCGTGGTGGGTGTCATGGTTrcTCC 

AGTTTCAATCAGAGCAACTGAGGTCAAACTTTCTGGAG 
GTGTOATGCAOTACAGCTTAGTTCATGCGCAACA^ 

CCGTCTACAGAGTGTTGCGGGAAACTGAAGGAGCAACAGCCGTGTTTTTGTACAT 
ATA^AAAGATCCAAGATATAGTCAATATGTTGGTTCTGCAAATGCTAAGAAAAC 

GTTAGCAACTTGTGGTGTTCCTTATCCTACTTGTTGA 

SEQ ID NO. 14, Deduced amino acid sequence of the open ^"S^mefPk202 
MAFTALWIVFWGVMVSPVSIRATEVKLSGGEADVTCDAVQLSSCATPMLTGVPPS 

TECCGKLKEQQPCFCTYIKDPRYSQYVGSANAKKTLATCGVPYPTC 

TACACCACAAGCGCCTGGGAGGAAAAGAGTAGCTGGAGAGATTGTGGAGAAGA 
CTGTTGAGAGGAGACAGAAGAGGATGAT 

caoga^ct^ 

TAGAAGAAGAAAACGAAAAACTTCGGAGGCTAAAGGAGGT^ 
CCAAGTGAACCACCACCAGATCCTAAGTGGAAGCTCCGGCGAACAAACTCTGCT 

TCTCTCTGA 

SEO ID NO- 16, Deduced amino acid sequence of the open reading frame °f pk2 06 ^^^ 

^DELLK^PAEEGLWQGSL^ 

^P^GE^EDLLL^ 

PVCEMQDMVMMGGLSDTPQAPGRKRVAGEIVEKTVERRQKRMIKN^ 
KQAYTmLEIKVSRLEEENEKLRRLKEVEKILPSEPPPDPKWKLRRTNSASL 

SEO ID NO:l 7, Nucleotide sequence of the open reading frame ofPk207 

ATGGCGCAATCCCGATTAT^ 

CAATCG^TCAAAGGCGTTTA 

TCCAGAGATCCATGCCGGTAACGATGGAGCCGATCCAGCT 

CCCTGAAGGTATGGATGATGTTGCAAACCCTAAAACGGCGGCGGAAGAAATCGT 
AGACGATACTCCCCGACCGAGTTTAGAAGAGCAACCGCTTGTACCGCCGAAATC 
TCCACGCGGCACTGCGCACAAGCTAGAGAGTACTCCCGTTGGTCACCCGTCA 

OTCA^C^ 

CCGTGAGCTGTGCTGGTTTAGACGGTTCACCATGG^ 

TGGAAGAGCAAAGGCGAAGAGAAGATGAAACAGAGAGTGACCAAGAGTT^AC 
AAACACCACAAAGCTTCTCCGTTATCGGAGATTGAACT^ 

CTATTACGCAAGCTACCGATGGAACTGCCTACCCAGCCGGGAAAGATGrGATCG 
GATGGTTACCGGAGCAGCTAGACACGGCGGAAGAATCTTTGATGAAAGCAACAA 
TGATATTCAAACGCAACGCAGAACGTGGCGATCCTGAAACGTTTCCTCATTCTAG 
AATCTTAAGAGAAATGAGAGGCGAGTGGTTTTAA 
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SEO ID NO 18, Deduced amino acid sequence of the open reading frame °f Pk2 07 
^^^^L^^AARSRVRPIAQRRLAFGS STSGRTADPEIHAGNDGADPArYTODPEG 

Xdva^-^eivddtprpsleeqplwpkspratahklestpvghpsephfqqkr 

™tSitqatogtaypagkdvigwlpeqldtaeeslmka 
fphsrilremrgewf 

SEO ID NO: 19, Nucleotide sequence of the open reading frame ofPJO09 
ATCTCCGTGGCTCGATTCGATTTCTCTTGGTGCGATGCTGATTATCACCAGGAGA 

CGCTGGAGAATCTGAAGATAGCTGTGAAGAGCACTAAGAAGCTTO 
GCTAGACACTGTAGGACCTGAGTTGCAAGTTATTAACAAGACTGAGAAAGCTAT 

TTCTCTTAAAGCTGATGGCCTTGTAACTTTGACTCCGAGTCAAGATCAAGAA.GCC 
TCCTCTGAAGTCCTTCCCATTAATTTTGATGGGTTAGCGAAGGCGGTTAAGAAAG 

GCTGCTACTCTGGGTGGTCCGTTATTCACATTGCACGTCTCTCAAGTTCACATTGA. 
mTGCCAACCCTAACTGAGAAGGATAAGGAGGTrATAAGTACATGGGGAGTTCA 
OAATAAGATCGACTTTCTCTCATTATCTTATTGTCGACATGCAGAAGATGTTCGC 
rAGTTCCCGTGAGTTGCTTAACAGTTGTGGTGACCTCTCTCAAACACAAATATTTG 

CGAA^ATOAGAA^GAGGGACTAACCCACTTTGACGAAA 
CAGATGGCATTATrCTTTCTCGTGGGAATTTGGGTATCGATCTACCTCCGGAAAA 

GGTG^TTTTTCTTCCAAAAGGCTGCTCTTTACAAGTGTAACATGGC 

GCCGTTCTTACTCGTGTTGTAGACAGTATGACAGACAATCTGCGGCCAACTCGTG 

CAGAGGCAACTGATGTTGCTAATGCTGTTTTAGAJ'GGAAGTGATC 

TGGTGCTGAGACTCTTCGTGGATTGTACCCTGTTGAAACCATATCAACTGTOGT 

AGAATCTGTTGTGAGGCAGAGAAAGTTTTCAACCAAGATTTGTTCTTTAAGAAGA 

CTGTCAAGTATGTTGGAGAACCAATGACTCACTTGGAATCTATTGCTTCTTCTG 

GTACGGGCAGCAATCAAGGTTAAGGCATCCGTAATTATATGCTTCACCTCGTCTG 

Sagagcag^^ 

TGTCATTCCCCGACTTACGACAAATCAGCTGAAGTGGAGCTTTAGCGGAGCCTTT 

gaggcaaggcagtcacttattgtcagaggtcttttccccatgcttgctgatcctc 

GTCACCCTGCGGAATCAACAAGTGCAACAAATGAGTCGGT^ 

AGACCATGGGAAGCAAGCCGGAGTGATCAAGTCACATGACAGAGTTGT^ 

TCAGAAAGTGGGAGATGCGTCCGTGGTCAAAATCATCGAGCTAGAGGATTAG 

SEO ID NO.20, Deduced amino acid sequence of the open reading frame ofPk209 

MSVARFDFSWCDADYHQETLENLKIAVKSTKKLCAVMLDTVGPELQVINKTEKAIS 

LKADGLVTLTPSQDQEASSEVLPINFDGLAKAVKKGDTIFVGQYLFTGSETTSVWLE 

VEEVKGDDVICISRNAATLGGPLFTLHVSQVHIDMPTLTEKDKEVISWGVQl^FL 

Sl^YCRHAEDWQARELLNSCGDLSQTQlEAKffiNEEGLTHFDEILQEAIXJIILSRGNL 

GIDLPPEKWLFQKAALYKCNMAGKPAVL^ 

DGSDAILLGAETLRGLYPVETISTVGmCCEAEKW 

IASSAVRAAIKVKASVnCFTSSGRAARLIAKYF^ 

FEARQSLIVRGLFPMLADPRHPAESTSATOESVLKVALDHGKQAGVIKSHDRVVVCQ 
KVGDASWKIIELED 

SEO ID NO:21, Nucleotide sequence of the open reading frame ofPk215 
A'TGGCGATTTACAGATCTCTAAGAAAGCTAGTTGAAATCAATCACCGGAAAACA 

AGACCATTCCTCACCGCCGCTACAGCTTCCGGCGGAACCGTTTCTCTGACTCCAC 

CGCAGTTTTCGCCGTTGTTCCCACATTrCTCACACCGTTTATCTCCGCTTTCGAAA 
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TGGTTCGTTCCTCTTAATGGACCTCTC 

CAGTCTGCGACACCTITGCACTGGCGCGGAAACGGCTCrGTTI^ 

AAGCTCTGAATCTTAGATTGGATCGAATTAGAAGCAGAACTAGGTTTCCGAGAC 

AGTTAGGGTTACAGTCTGTGGTACCAAACATATTGACGGTGGATCGCAACGATTC 

CAAGGAAGAAGATGGTGGAAAATTAGTCAAGAGTTTTGTTAATGTGCCGAATAT 

GATATC^TGGCGAGAITAGTATCTGGTCCTGTGClTrGGTGGATGAT^^ 

GAGATGTATTCTTCTGCTTTCTTAGGGTTGGCTGTTTCTGGAGCTAGTGATTGG^ 

AGATGGTTACGTGGCTCGGAGGATGAAGATTAACTCTGTGGTTGGCTCGTACC^ 

GATCCTCTTGCAGACAAGGTTCTTATCGGGTGTGTAGCAGTAGCAATGGTGCAGA 

AGGATCTOTACATCCTGGACTGGTTGG^ 

CGTTGGTGGTGCAGTTTACCTAAGGGCACTAAACTTGGACTGGAGGTGGAAAAC 
TTGGAGTGACTTCTTCAATCTAGATGGTTCAAGTCCTCAGAAAGTAGAACCATTG 

mATAAGCAAGG^ 

TrCAACCAGAGTTTGGGAATCCAGACACCCAGACATGGATCACTTATCTAAGGTA 
A 

SEO ID NO:22, Deduced amino acid sequence of the open reading frame »/^75 
N^YF^LRKLVEIN^ 

LNGPLFLSSPPWKLLQSATPLHWRGNGSVLKKVEALNLRLDRIRSRTPJPRQLGLQS^ 
VWMLTVDim)SKEEDGGKLVKSFVNWNMISMARLVSGPVLWWMISNEMYSSAF 
LGLAVSGASDWLDGYVARRMKINSWGSYLDPLADKVLIGCVAVAMVQKDLLIffG 

LVG^UmV^ 
FQLTLVAGAILQPEFGNPDTQTWITYLR 

SEQ ID NO: 23, Nucleotide sequence of the open reading frame ofPk239 
ATGGTAAAGGAAACTCTAATTCCTCCGTCATCTACGTCAATGACGACCGGAACAT 

CGAA^GGATGGAGAGAGGTATGGGATTCAGCAGATGCGGATTTGCAGCTGATGC 
GAGACAGAGCTAACTCTGTTAAGAATCTAGCATCAACGTTCGATAGAGAGATCG 

AGAATTrcCTCAATAACrcGGCGAGGTCTGCGTTTCC 

GTCGTCTTTCTCAAATGAAATTGGTATCATGAAGAAGCTTCAGCCGAAGATTTCG 

GAGTTTCGTAGGGTTTATTCGGCGCCGGAGATTAGTCGCAAGGTTATGGAGAGA.T 

GGGGACCTGCGAGAGCGAAGCTTGGAATGGATCTATCGGCGATTAA^A^^GOjA 

TTGTGTCTGAGATGGAATTGGATGAGCGTCAGGGAGTTTTGGAGATGAGTAGATT 

GAGGAGACGGCGTAATAGTGATAGGGTTAGGTTTACGGAGTTTTTCGCGGAGGC 

TGAGAGAGATGGAGAAGCTTATTTCGGTGATTGGGAACCGATTAGGTCTTTGAA 

GAGTAGATTTAAAGAGTTTGAGAAACGAAGCTCGTTAGAAATATTGAGTGGATT 

CAAGAACAGTGAATTTGTTGAGAAGCTCAAAACCAGCTTTAAATCAATTTACAA 

TGTTTGGTTAGACAATCTGAACCTTTTCTTGATCAGATTGGTGTTAGAAAGGATA 

CATGTGACCGAATAGTAGAAAGCCTTTGCAAATGCAAGAGCCA^ 
GTCTGCCATCTGCACAAGCATCCGATTTAATTGAAAATGATAACCATGGAGTTGA 

TTTGGATATGAGGATAGCCAGTGTTCTTCAAAGCACAGGACACCATTATC 

GGGTTTTGGACTGATTTTGTGAAGCCTGAGACACCGGAAAACAAAAGGCATGTG 

GCAATTGTTACAACAGCTAGTCTTCCTTGGATGACCGGAACAGCTGTAAATCCGC 

^TOAGAGCGGCGTATITGGCAAAAGCTGCAAAACAGAGTGTTACTCTCGTO^^ 

TCCTTGGCTCTGCGAATCTGATCAAGAACTAGTGTATCCAAACAATCTCACCTTC 

AGCTCACCTGAAGAACAAGAGAGTTATATACGTAAATGGTTGGAGGAAAGGATT 

GGTTTCAAGGCTGATTTTAAAATCTCCTTTTACCCAGGAAAGTTTTCAAAAG 

GGCGCAGCATATTTCCTGCTGGTGACACTTCTCAATTTATATCGTCAAAAGATGC 
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ACACAAGATTTA^ 

TCCTTATGATTGGGGAGAAAATTGCTGAAGAGAGATCCCGTGGTGAACAAGCTTT 

CTCA^A^iTG^ATACTTCTTAGGAAAAATGGTGTGGGCTAAAGGATACAGAGA 

AGTAATAGATCTGATGGCTAAACACAAAAGCGAACTTGGGAGC 

TGTATATGGGAACGGTGAAGATGCAGTCGAGGTC^ 

TGACTTGAATCTTCAA 

Iag^aaactot^ 

G^AGAAGCACTAGCCATGGGGAAGITrGTG^^ 
TGTCGAAAGTGCA^ 

aSS^^acaatctctcttgggaagcagcaacacagaggttcatggagtattcag 
^Sa?aXgatcttaaa^ 

AGATCGGHTCCGAG 
ATGTOTAACAGGGAACGA^ 

caaaa^a^tS^^^Saatcaacattgcaaggatctgaatctcgtaccacctcacgt 

TCACAAGCCAATCTTCGGCTGGTAG 

SEQIDNO:24, Deduced amino acid sequence of the open re ^ in SfrcmteofPk239 
S^TLIPPSSTSMTTGTSSSSSLSMTLSSTNAI^FLSKGWREVWDSADADLQLmD 

^^V^ASITDREIENFLNNSARSAFPVGSPSASSFSNEIGIMKK^ 

APMSRK^MER^ 
VRF^eIfFAEAE^ 

kstVketdeakdwpldwellaclvrqsepfldqigvrkdtcdriveslckc^ 

SPEEOeI^^ 

aycdk\^pjisaatqdlpk^ 

GSr^^^LmLMAKHKSELGSFNLDWGNGEDAVEVQRAAKK^ 
K^RDFIADT^ 

TYKTSEDFVSKVQEAMTKEPLPLTPEQMYNI^WEAATQRFMEYSDLD 

RKMR^RSVPSFlW^ 
VPPHVHKPIFGW 

SEO ID NO:25, Nucleotide sequence of the open reading frame ofPk240 
ATGGCGACTTITGCTGAACTTGTTTTATCGACTTCTCG 

^CATCCACTAGAA^^ 

GTGATACC/^CCA^GTTTCGTTCCGGAOT 
CAAAG^OTGATAG^ 

AAGCAAGCCAGOTAAGAAAAGAGTGATCTTTGGTATTGGCA^ 
TCGATOTGT^ 

TTTTTATCGGTTCCCGCGAATATTTCGAGCTTGTTAGAAGTAGAGGCATAGCTAA 
AGGAATGACTCCTCCTCCACGATATGTATCTCGAGTTTGCTCGGTTATATGTGCCC 

GCATTTGTTGTTGCAATAGCATTGTTAGTACAAAGAGGATCCCCACGTTTTGCTC 
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AGCTGAGTAGTACAATGTTTGGTCTGTTTTACTGTGGTC 

OTAAGCTTCGCTGTGGTTTAGCTGCTCCTGCGCTTAACACTGGTATCG^ 

CATGGGCAATTCTTCTTGGTGGTCAAGCTCATTGGACAGTTGGACTTGTGG^ 

ATOATTrcmCAGCGGTGTAATTGCGACAGACACATITGCTm 

AGACmGGTAGGACACCTCTTACTAGTATTAGTCCCAAGAAGACATGGG^ 

AACTATTGTAGGACnrrG 
TCAGTTOGCCACAATCTCTGTTCAGC 

^^OTCA^^CT^GGTGATCTTACTGAATCAATGATCAAGCGTGATGCTGGCGTCA 
AAgIcTCTG^^ 

^ACATTTTCACCGGCGCATTAGCTTATTCATTCATCAAAACATCCCTAAAACTTT 
ACGGAGTTTGA 

SEO ID NO:26, Deduced amino acid sequence of the open reading frame „ 

SfafxWstsrctcpcrs^^ 

V^I^DQLGDDDHSKGroRIHNLQNVEDKQKKASQLKKRWGIGIG^LPyGC 

GGm^^ASSWIGSREYFELV^ 
GNTOILVTSAAFVVAIALLVQ 

L^^ffltGRT^^^LGGQ AHWTV GLVATLISFSGVIATDTFAFLGGKTFGRlPI^rSISPKK 
TWEGTTVGLVGCIAn^LSKYLS^ 

KDSGSLIPGHGGILDRVDSYIFTGALAYSFIKTSLKLYGV 

SEg /D M*27, Nucleotide sequence of the open reading frame ;g™^ A ___„ A 
ATGGCTCAAACCATGCTGCTTACTTCAGGCGTCACC^^ 

ACAAGAGCCCTTTGGCTCAGCCCAAAGTTCACCATCTCTTCCrCTCTGGAAACTC 
CCAAAACC^GCTGCTCCTAAAAAGGTTGAGAAGCCGAAGAGCAAGGrrGAGG 

a5g^5c^gaa^gtctggtgggattggtttcacaaaggcg^ 

C^GTTGGTCGTGTTGCTATGATCGGTTTCGCTGCATCGTTGCTTGGT 

SSsaa^g^ 

AAGCAGA^CATTGCTTCTCTTCTTCATCTTGTTCACTCT 

gctctcggXgacagaggaaaato^ 

ctcttaaacgttgctttcttcttcttcgctgccattaatcctggtaatggaaaatt 

CATCACCGATGATGGTGAAGAAAGCTAA 

SEQ ID NO:28, Deduced amino acid sequence of the open reading frame ofPk241 
MAQTMLLTSGVTAGHFOINK^ 

TKA^KKVEKPKSKVEDGIFGTSGGIGFrKANELFV'GRVAMIGFAASLlJjEALTGKGI 

LAG^LN1£TGIPIYEAEPLLLFFILFTLLGAIGALGDRGKFVDDPPT 
SALGLKEQGPLFGFTKANELFVGRLAQLGIAFSLIGEDTGKGAIAQLNIETGIPIQDIEP 

LVLLNVAFFFFAAINPGNGKFITDDGEES 

SEQ ID NO:29, Nucleotide sequence of the open reading frame ofPk242 ..... 
ATOGGTGCAGGTGGAAGAATGCCGGTTCCTACTTCTTCCAAGAAATCGGAAA 

GACACCACAAAGCGTGTGCCGTGCGAGAAACCGCCTTTCTCGGTGGGAGATCTG 
AAGAAA^iCAATCCCGCCGCATTG^ 

ACCTTATCAGTGACATCATTATAGCCTCATGCTTCTACTACGTCGCCACCAATTAC 
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CATTC^GC^ACTAC^ 
CTCCTCGTGC^^ 

^ATCcScGAAAGAGATGAAGTAirTGTCCCAAAGC 

GTGGTACGGGAAATACCTCAACAACCCTCTTGGACGCATCATGATGTTAACC^G 

CAGT^GTCCTCGGGTGGCCCTTGT 

TGACGC^TTCGCTTGCCATTTCTTCCCCAACG 

CGCCTCCAGATATACCTCTCTGATGCGC3GTATTCTAGCCGTCT^ 

cSotacgctgctgcacaagggatggcctc^^ 

C^CTGATAGTCAATGCGTTCCTCGTCTTGATCACTTACTTGCAGCACA 
CTCG^GCCTCACTACGATTCATCAGAGTGGGACTGGCTCAGGGGAGCTITGOT 
ACCGTAGACAGAGACTACGGAATCTTGAACAAGGTGTTCCACAACATTACAGAC 
ACACACGTGGCTCATCACCTGTTCTCGACAATGCCGCCTTATAACGCA 

ctacaaao^g^ 

CGTGGTATGTAGCGATGTATAGGGAGGCAAAGGAGTGTATCTATGTAGAACCGG 
ACAGGGAAGGTGACAAGAAAGGTGTGTACTGGTACAACAATAAGTTATGA 

SEO ID NO 30 Deduced amino acid sequence of the open reading frame ofPk242 

niASCFYYVATNYFSLLPQPLSYLAWPLYWACQGCVLTGIWVIAHECGHHAFSDYQ 
WLDDTVGLIFHSFLLWYFSWKYSHRRHHSNTGSLERDEVFWKQKSAIKWYGKYL 
T^LGRIMMLWOFVLGWPLYLAFNVSGP^YDGFACHFFPNAPIYNDRERLQIYLSD 

^^RGALATVDRDYGILNKWHMTDTHVAHm 
YYQFDGTPWWAMYREAKECIYVEPDREGDKKGVYWYNNKL 

SEQ ID NO: 31, Nucleotide sequence of the open reading frame °f B ^l{^ rt ^___ ...... 

ATGGCTTCAATAAATGAAGATGTGTCTATTGGAAACTTAGGCAGTCTCCAAACAC 

tcccagactc^^ 

TCCGCCGCTGTGAAAGAGTCCATTCCGGTCATCGACCTCTCCGAT 

CCAATTTGTTACjGAAATGCATGCAAAACGTGGGGAGCGTTTCAGATAGCCAACC 

ACGGGGTCTCTCAAAGTCTCCTCGACGACGTTGAATCTCTCTCCAAAACCTmTC 

GATATGCCGTCAGAGAGGAAACTCGAGGCTGCTTCCTCTAATAAAGGAGT^GT 

GGGTACGGAGAACCTCGAATCTCTCTTTTCTTCGAGAAGAAAATGTGGTCTC 

gotgacaatcgccgacggctcctaccgcaaccagttccttacta™ 

TGATTACACCAAATACTGCGGAATAATCGAAGAGTACAAGGGTGAAATGGAAAA 
ATTCAGCAAGCAGACTTCTATCATGCATATTAGGATC 

GACATCGAATGGGCTAAGAAGACCGAGAAATCTGAATCAAAAATCGGCC 

CGTCATACGACTAAACCATTACCCGGTTTGTCCTGAGCCAGAAAGAGCCATGGGT 

CTAGCCGCTCATACCGACTCATGTCTTCTAACCATTTTGCACCAGAGCAACATGG 

GAGGGCTACAAGTGTTCAAAGAAGAGTCCGGTTGGGTTACGGTAGAGCCCATTC 

CTGGTGTTCTTGTGGTCAACATCGGCGACCTCTTTCACATTCTATCGAATGGGAA 

GTITCCTAGCGTGGTTCACCGAGCA^GGGTTAACCGAACCAAGTCAAGAATATC 

GATAGCGTATCTGTGGGGTGGTCCAGCCGGTGAAGTGGAGATAAGTCCAA^TC 

AAAGATAGTTGGTCCGGTTGGACCGTGTCTATACCGGCCAGTTACTTGGAGTGAA 

^G^GAA^^ 
TTAATCCCACCAATTGA 

SEQ ID NO: 32, Deduced amino acid sequence of the open reading frame ofBnOll 
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MASINEDVSIGNLGSLQTLPDSFI^ 
mSLGVTVDDIEWAKKTEKSESKMGQSVI^^ 

NPTN 

SEO IDNO-33 Nucleotide sequence of the open reading frame ofBn077 A 

ItgGCTA^ 

rorTAATGATACTGATATTGATGATCCTGATCATGATCATCATGATGGTGTTCAG 

caagagIag^gYg^at^ 

ATTTCCATGTTGGGTAATTTTAATGGCCGCTTCTGA 

SEO ID NO 34 Deduced amino acid sequence of the open reading frame ofBn077 
^L\TFSCN S^^QMHAPFDRHANDTETODPDHDHHDGVQQEE ^ G WT ^^^^^^G?1 
^ED^HQDKSSCS^^ 

D^EDTASSPVNSPKVSQffifflQTPPRKHEDYVSSSFVMGNMSGMGDHQIQIQEGDEQ 
KLTMMRNLREGNNSN SNNMD LRARGLC VVPISMLGNFNGRF 

SEQ ID NO: 35, Nucleotide sequence of the open reading frameofJbOOl aaaa _ aa ..__ 
ATCGCAACGGAATGCATTGCAACGGTCCCTCAAATATTCAGTGAAAACAAAACC 

AATCACTAAC^CATGGTATCGATGAGAGCCTCTTGTCTCGTGCCTATCTGCATATG 
GACTCTTTCTTTAAGGCCCCGGCTTGTGAGAAGCAGAAGGCTCAGAGGAAGTGG 
GGTGAGA^TCCGGTTACGCTAGTAGTTTCGTCGGGAGATTCTCCTCAAAGCTCC 
CGTGG^AG^GAGACTCTGTCGTTTAAGTTCTCTCCCGA 

AACCGTTAAAGACTTTGTTTCTAAGAAAATGTGCGATGGATACGAAGATTTCGGG 

aagg^a^aa^ 

GAGCTTCTTGGAATGAGTCTTGGGGTCGAGAGGAGATATTT^ 
AAGACAGCGATTCAATATTCCGGTTGAATTACTACCCGCAG1GCAAGCAAC 

AGCTrGCACTAGGGACAGGACCCCACTGCGACCCAACATCOT 
TCAAGACCAAGTTGGCGGTCTGCAAGTTTTCGTGGACAACAAATGGCAATCCATT 

CCTCCTAAC^ 

CGAATGGAAGATACAAGAGTTGTTTGCATCGGGCGGTGGTGAAC 

AAAGGAAGACGTTTGCATTCTTCCTATGTCCGAAAGGGGAAAA^ 

CACCAGAAGAACTAGTAAACGGAGTGAAGTCTGGTGAAAGAAAGTATCCTGATT 

TTACGTGGTCTATGTTTCTCGAGTTCACACAGAAGCATTATAGGGCAGACATGAA 

CACTCTTGACGAGTTCTCAATTTGGCTTAAGAACAGAAGAAGTTTCTAA 
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SEQ ID NO: 36, Deduced amino acid sequence of the open r ^^Sfram^fJb001 
A^FT SGDSC^^^ 

YPDFTWSMFLEFTQKKYRADMNTLDEFSIWLKNRRSF 

r^AO^G^AOAGTC^AAAGATOACGAOTCATTICQAGTCCATQOCCGA 

SSSSaSSaSS^aa^ 

TTGAGCCATAGT(^ACAAGAAGCTCGTGGAGGAAGAGGTGAAGAAATGG 
I^GGGC^ 

GGTGGA^GGTGAGGAAAAAGAGAGTGGTGTACATGGCnTTCATGGGGAG 

a^aca^t^ 

TAAAGAATGAGCAGGTGGTGTTGGTGGTCGTAGCGTAAAAGATACGGTAG 

r aaaggacagc^Xgctaaggaaagtgtaggagaaggtgctcagaaagcgggca 

GTGCIACGAGT^ 

AAGAAGCTGGAAATATGACAGCTGAACAGGCGGCGAGAGCAAAA^^^ 
rTGCAGAAAGCTGTTGAAGCTAAAGAGACTGCGGCGGAGAAAGCTCAGAGAGCT 

5cc§ag^atgaaSaaacaggaa^ 
aSgI^acactcttcagaaagctgtggaagc^ 

a?tg^ca^aaXgccg^ 

gSag^a^a^^ 

g^gggtcgaI^^ 

AGCGGTTG^AGGTACA^^ 

g^SgSgg^ 

TgWgGTGGTTGGGTATACGCTCGAGGAAG 

ACCAAGAGATGCATCAGGGAGGTGAGGAAGAAAAGCAACCAGGGTTTOTCTCAG 
GAGCAAGGAGAGACTTTGGAGAAGAGTA^ 

GATGTCTACGGCTATGGAGCAAAAGGAATACCCGGAGAAGGGAGGGGAGATGTT 

ggggI^aS^^ 
atatgagacgggaacatggacaacgttga 

S£g ZD A/0.-3S, Amino acid sequence of the open reading frame ofJb002 
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===== 

GV^VKDWAEKGQQAKESVGEGAQKAGSATSEKAQRASEYAIEKG^AG^ 
TAEOAARAKD Y AIXJKAVEAKETAAEKAQRASEYMKETGSTAAEQ AARAKDYTLQ 
KA\^AKDVAAEKAQRASEYMTETGKQAGNVAAQKGQEAASMTA^ 
CTAAGYDQBTTVEGGKG AAHY AGV AAEKAAAVGWT AAHFTTEKVVQGTKAVAGT 
WGAVGYAGHKAVEVGSKAVDLTKEKAAVAADTWGYTARKKEEAQHRDQ^ 

G^^WGYGPKGTVEEARRDVGEEYGGGRGSERYVEEEGVGAGGVLGAIGETIAEI 
AQTTKNWIGDAPVRTHEHGTTDPDYMRREHGQR 

SEO ZD i\ra-3P, Nucleotide sequence of the open reading frame of Jb003 r ^ r ^ 
AT^CTAAGTCTC^^ 

CATCCCGAGTCACGCGGTCGGTTCGAGCCAAAGATTCTTATGCCGACAGAGGAA 
GCTAACCCGGCTGACCAAGACGGAGATC 

GTCGCTGGTTCTTCTGGATATGGAAACTACAGACACCAGGCTG 

CATATCAAATACTAAGAAAAGGAGGTTTAAAGGAAGAGAACATAGTCGT^GA 

TGTATGATGATATCGCAAACCACCCACTTAATCCTCGTCCGGGTACTCTCATCAA 

CCATCCTGACGGTGACGATGTTTACGCCGGAGTC 
AGTGTTACG^CTGCAAACTTCTACGCTGTACTCCTAGG 

aagg^g^ 

ATTATGCGGATCATGGTGGTCCCGGAGTTCTTGGGATGCCAAATACGCCTCACAT 

ATATGCAGCTGATTTTATTGAAACGCTTAAGAAGAAGCATGCTTCCGGAACATAC 

AAAGAGATGGTTATATACGTAGAAGCGTGTGAAAGTGGGAGTATTTTCGAAGGG 

ATAA^^^^^^GGACTTGAACATTTACGTAACAACGCKITTCAAATCtCACAAGAG 

AG^GTTATGGAACATATTGTCCTGGCATGAATCCGTCACCCCCATCTGAATATA 

TCACTTGCTTAGGGGATTTATATAGTGTTO 

CAAT^AAAGAAAGAGACCATAAAGCAACAATACCACACGGTGAAGATGAGGA 
CATCAAACTACAATACCTACTCAGGTGGCTCTCATGTGATGGAATACGGTAACAA 

TAGTATTA^GTCGGAGAAGCTTTATCTTTACCAA<^ 

AATCTCCCACTAAACGAATTACCGGTCAAGTCAAAAATAGGAGTCGTTAACCAA 
CGCGATGCGGACCTTCTCTTCCTTTGGCATATGTATCGGACATCGGAAGATGGGT 
CAAGGAAGAAGGATGACACATTGAAGGAATTAACTGAGACAACAAGGCATAGG 
AAACATTTAGATGCAAGCGTCGAATTGATAGCCACAATTITGTTTGGTCCGACGA 

TGAATGTTC^^ACTTGGTTAGAGAACCCGG 

ATGTCTTAAATCGATGGTACGTGTATTTGAAGAGCATTGTGGATCACTAACGCAA 
TATGGGAT^AAACATATGCGAGCGTTTGCAAACGTTTGTAACAACGGTGTGTCCA 

aag^tga^ 

GCTACACGGTGCATCCATCAATCTTAGGCTATAGCGCCTGA 

SEO JD NO:40, Deduced amino acid sequence of the open reading frame ofJb003 
MA^CWRPALLLLLVLLVHAESRGRFEPKILMPTEEANPADQDGDGVGTRWA^ 

AGSSGYGNYRHQADMCHAYQILRKGGLKEENrV^LMYDDIANHPLNPRPG^ 

DGDDWAGVPI^YTGSSVTAANFYAVLLGDQKAVKGGSGKVIASKPNDHIFVYYA 

DHGGPGVLGMPNTPHIYAADF1ETLKKKHASGTYKEMVIYVEACESGSIFEGIMPKD 

LNn^TTASNAQESSYGTYCPGMNPSPPSEY 
KQQYHTVKMRTSNYNTYSGGSHVMEYGNNSIKSEKLY 

KSKLG WNQRD ADLLFLWHMYRTSEDG SRXKDDTLKELTETTRHRKIILD AS VELIA 
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TILFGPTMNVLNLVREPGLPLVDDWECLKSMWVFEEHCGSLTQYGMKHMRAFAN 
VCNNGVSKELMEEASTAACGGYSEARYTVHPSILGYSA 

S£0 2D MWi. Nucleotide sequence of the o^^^^X^TrrraXTC^TTC^ 
ATCKjACGGTGCCGGAGAATCACGACTCGGTGGTGAT 

GT 

TOAOTCAAA?AGCTCTGGA^ 
rrTTTACTTGGirTGATrTCTCT^ 

rAomCAACGCAAGATCCnTGAGCGTTCTGGTTTAGGGGAAGACACTTATGTCC 
r^GAAGCTATGCATT^ 



rTTGTTTCGAGTTGGTGGCTCTGCGGTTTTGCTATCGAACAAGTCGAGGGACAAG 
AGA^<^^^ 



GATAAAGCTTTCCGTTGTGTTTATC^ 

ITCGTTG 
TCACTACA1 



rTTTrGTTGTCGAAAGATCTAATGGCGATTGCAGGGGAAACTCTCAAAACCAATA. 

^TA^GGGTCCTOTOTCTACCGATAAGTGAGCAGATO^ 
ACTCTAG^TCT^AAG^J^^ 



ACCT^C^ 
AGTTATTAG 

ID Deduced amino acid sequence of the ^J^^^(^ 5 ^ m T ^ 

ySa^a^^gn^^ 

VTLSY 

SEe ZD MX Nucleotide sequence of the open r ^ in Sfr^°{^^ n ^ rrT ^ r ^ c 
CGATACA^^L^AATGCCCATGCATGCAAAAGCAGCTGATCAGTTACCACCAAA 
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^r^A^fiGCCK5TTTAGGTGGTCTAGGCGGAGCCGGAGGTGGTTTAGGTGGAGTT 

got££ctc?g^^^ 

ACGGCGTGGTCGGTGGTGTGATCGATCCACATCCTTAA 

SEQ ID N0:44, Deduced amino acid sequence of the ^"^^^"^{^^t ppl ™ 

7gSlggvgg^vg^lggv^ 

GGLGGLGGAGGGLGGVGGLGKAGGIGVGGGIGGGHGWGGVIDPHP 

S£0 2D JVO 45. Nucleotide sequence of the open reading frame ofJb009 
ATGGCAAGCAGCGACGTGAAGCTGATCGGTGCATGGGCG^ 

IgSgagga^ 

rr^GGOTCTAAGAG^ 

?aS£S 

===== 

CCCGCTCTAA^CCTGTCATGCCCGAGA^ 
AGATCTTTCCTAAGCCGCAGGCCTAA 

SEQ ID NO:46, Deduced amino acid sequence of the open reading frame ££009 
^SSDVKLIGAWASPFVMI^RIALNLKSVPYEFLQETFGSKSELLLKSNPVHKI^VL 

tITadct^esn^^dtwsssgpsilpsdp 

TOI^VS^E^TPSLSKW^ 

SEQIDNO:47, Nucleotide sequence of the open r ^ din S^^ eo fJ^[ 3 LnrnraA 

ATCGCGTCTCAACAAGAGAAGAAGCAGCTGGATGAGAGGGCAAAG 

GACCGTCGTGCCAGGTGGTACGGGAGGCAAAAGCTTCGAAGCTCAACAGCATC 

cgctgaagggaggagccgaggagggcaaactcgaaatcag^^^ 

AAGGATATCAGCAGATGGGACGCAAAGGTGGTCTTAGCACCGGAGACAAGCCTG 

GTGGrcAACA^ 
ACCAAGACCTAA 

SEQ ID NO:48, Deduced amino acid sequence of the open r ^^SfranwofJW13 
MASQQEKKQLDERAKKGETVWGGTGGKSFEAQQHIAEGRSRGGQTOKEQLGTEG 

YQQMGRKGGLSTGDKPGGEHAEEEGVEIDESKFRTKT 
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AAGTGGA^^^cS^rcAAGAAGTGAGGTCCATCTACTrACAAroGTCCGCAGAA 

§AAC?^GLGTTGTTGGGCGTTTrCGACTACTGC^ 

OAC ^^^g^^^ CACT GAGAAAGATTATCCTTACCGTGGATT 
GAAAA^^^CTTGAAGAATrC^ 

^?A?^ACCG^GTrGT(KJTACAAATCTTGATCACGCCK3TAGTT(KnX3TC 

AAATCCGG^AAGTGTGGGATTGCGGTTGAAGCCTCGTACCCGGTTAAGT^ 
CAAACCCGGTTCGTGGAAATACTATCAGCAGTGTTTGA 

SEQ ID NO: 52, Amino acid sequence of the open ^"^aroe 7 

====== 

NTT^DYPYRG^^ 

ERNLAASKS GKCGIAVEASYPVKYSPNPVRGNTIS SV 

S£0 iE> M>:53. Nucleotide sequence of the open r ^ din S^^J>f^24^ 

^tcatggcaa^ 

™g^c??atgaottc^ 

ctagaatSt^caaagagcttctggtg 

ScTATCATTCCG^ 

SSactaagagccataagaagctga^ 

CGGCCHTCGACCTCCGGATGGTATTGTCATCAATGGA^^ 

ATGGTAGTCCTTTTGGGACCATAAACGTTGAACCAGGACGAACATATCGTmCG 

Sga^c^c^^ 

AGTAATGACTACTACATTGTC 

GTGGAGTCGCTGTCTTGCGCTACTCTAATTCCCAAGGACCCGCTTCAGGTCCACT 
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CAACACCCCTAAAGCTTGCTCAGCAATACAACATCTCAGGGGTATA^ 
A^CCCAAAAAGGCCAATGAATAGGCACCCCAGG^ 

T^GAGCTACCACTTGGATGGTTATGCATTTTTTGTTGTTGGGATGGACTTTGGTC 
TGTGGACAGAAAATAGCAGAAGCACATACAACAAGGGTGATGCAGTTGCT 

CTACTACGCAGGTC'T^ 

SctgtcatgtgI^^ 

a^Sataotgagtgtggttaatccagaga^ 
cgttcctaaaaactctatatattgtggtcggctctcaccattacaaaagtaa 

<?FO m NO- 54 Deduced amino acid sequence of the open reading frame ofJb024 

rfa^Sgva^^^ 

O^FKYGQfwTDW^ 

wtet^sty^gdavars^ 
elylswnpeididssensvpknsiycgrlsplqk 

SEQ ID NO: 55, Nucleotide sequence of the open r ^ din sJrameofJb027 

ATCCTTCTAATTCTAGCGATTTGGTCACCAATTTCACACTC^ 

ACACTCAGGTCGCACAAAGTGTATCGCCGAAGACATCAAAAGCAATTCAATGAC 

TGTTGGTAAATACAACATCGA^ 

CACAAAATTTCCGTCAAGGTGACGTCTAATTCCGGTAACAATTACCATCACGCGG 

aIcaagta^ 

AGTGGAAGACTGGTGTTCAATCTAAAAGCTGGGCTAATGTTGCTAAGAAGAGTC 

a^ctc^gWggaatto^ 

TCATGAAGAGATGTATTATCTTAGAGATAGGGAAGAAGAGATGCAAGACTTGAA 
CCGOT^ACTAACACAAAAATGGCGTGGTTGAGTGTO^ 

ATAGGAGTTGCAGGGATGCAGTTTTTGCACTTGAAGACGTTTTTCGAGAAGAAGA 
AGGTTATCTGA 

SEQ ID NO: 56, Deduced amino acid sequence of the open rearfm^ame o^027 
M^nAXWSPISHSLHFDLHSGRTKCIAEDIKSNSMTVGKYNIDOT 

^v™sgWhhaeqvdsgqfafsaveagdymacftavdhkpe 

GVQSKSW^AKKSQVEVMEFEVKSLLDTWS^ 
TKMAWLSVLSFFVCIGVAGMQFLHLKTFFEKKKVI 

SEQ ID NO:57, Nucleotide sequence of the open reading frame o/0O-7 
ATCGCACATGCCACGTTTACGTCGGAAGGGCAGAATATGGAGTCGTTTCGACTCT 

TGAGTGGCCACAAAATGCCAGCCGTTGGACT 

AGCCGCCCACGCCGTTGTCACTGCAATCGTCGAGGGTGGCTATAGGCACATAGAT 
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ACAGCTTGGGAGTATGGTGATCAGAGAGAGGTCGGTCAAGG^ 

? a Tr^r A rGrTGGCCTTGAAAGGAGGGACCTCTTTGTGACCTCGAAGCTTTGGTGC 

ac^Sg^ 

TTrAA^AGAGTACCnTGATCTCTACITGATTCACTGGCC^ 

ArTT^3GAG^GAAATGGAGAATCTTTCC 
TG^CTA^Cm^AGTCACTAAGCTC 

ATCCCTGCCGTTT 

tSaa^tgcaaga^^ 

aga!gc^gaX?aIgacaccgggacagattctag^ 

tgaccagaaacgagtgatagacggtgaggatcttttcgtcaacaagaccgaagg 
tccattccgtagtgtggctgatctatgggaccatgaagactaa 

S£0 ID NO:58, Deduced amino acid sequence of the open reading frame f^O-l 
MAHATFTSEGQNMESFRLLSGHKJPAVGLGTWRSGSQAAHAVVTA 

AWEYGDQREVGQGIKRAMHAGLERRDLFWSKLWCTELSPER^ 
QALNSITDQKRVIDGEDLFVNKTEGPFRSVADLWDHED 

ACTO^CWA^CC^GGAGCT^^^ 

Ig^^Stocacaagagaccctacaaatgttoaag^tc^^go 
«Sg1agca?a^Wggaa<k^^^ 

g^to^^gotccaatatggacaccccttatcccagcatcaticaatgagg 
^^a^gaX^mgtctgaggttccgatgaaaagagcgggtcagccaa 

TTGA^^TCGCACCATCCTATGTITrCTrGGCGTGTAACCACTGCTCTTCTTACTrC 

actggtcaagttcttcaccctaatggaggagctgtggtaaatgcgtaa 

SEQ ID NO:60, Deduced ammo acid sequence of the open ™£"8fr^°f°0-2 
A^EK^GSEWMKRAGQPIEVAPSYVFLACNHCSSYFrGQVLHPNGGAVVNA 
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SEQ ID NO.61, Nucleotide sequence of the open r ^ d ^Sfi^ieofOO-3 ArT( ™ 

OA^TTQ^AGGAGACACTGTTGAGCCTGATAATGATCCTCCTCAGAAGATGGGG 
t^AOTCGG(^A^AACGTTGAACCCGACTTCAGCTATTATGTATGGAAACAGAG 

aSgga^^ 

TrGTGCCATGCTTGGAGAATGGGCAGAGGCTGCAAAAGACCnTCACCTTGCATCT 
ACGAIAGACT 

GCACATAAGOTGAGGAGCACCGTAGAAAGTA^^ 
GAGGACAAAAAGOT 

GCCXA^G^^^^^CT ^^^^^^^^^^^^^^^f^^^^^^^^^^Q'^^^ 

TCGG^G^TATGCCCGGTGGA^CCCAGGAGG 
GAA^GCG^ 

GT^CGGTATCCCAGGTGCAGGCGGTGGTATGCCTGGTGGTGGCGGTATGCCTG 

GTGGTATGGAC^CAGCAAAATATTGAATGATC 
CGACCCTGAAGTCATGGCTGCTCTTCAAGATGTGAT 

g2gaag£^ 
aaatttgcaggacctcagtaa 

<ZEO ID NO -62 Deduced amino acid sequence of the open reading frame ofOO-3 
^S^SELK^GCKSDPSLLTTPSLSFFRDYLESLGAKI^ 

gggISaI^^ 

DPEXMTAFSDPEVMAALQ 

SEQ ID NO:63, Nucleotide sequence of the open reading frame ^00-4 
ATOAAGGTTCACGAGACAAGATCTCACGCTCACATGTCIGGAGACGAACAAAAG 

AAGGGAAATTTGCGGAAGCACAAAGCAGAAGGGAAACTTCCAGAATCTGAACA 

GTC^CA^^^GAAGGCAAAGCCTGAAAACGATGACGGACGTTCTGTCAACGGCGC 

?ggI^^ttcagagtacaatgagttctgcaaagc^ 

GTCCATTGATCAGATTAAAGAAGTTCTCGAAATCAACGGCCAAGATTGTTCTGCT 
2cAGAA^AC^ 

CTAAATGTCCTTTATGCGGAGGAACTTTAATTTGCG 
ATGTCGAGGTGAGAT^^ 

TCCTAGAAAGGAAGAGCCAGTTAAAATCCCTGATTCTGTCATGAA 

TCTGAC^ATC^GAAACACCAGGACCCTAAAAG 

GGCTCTGCTGATAAACCCTTTGTGGGAATGATGATCTC 

Sagaa^a^^^ 
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TCCAATACTGTTCAAGGCGTAACATGTTTGGTGGTTTCGCCAGCTGAAAGAGAAC 

T^GAAAAGAGGAGTGTACATGGACACAAAACTTCAGGAGAGAGGAGGAAAGA 

j^p^^J^^^^^^TO(^CTCTTGTATAACTGTGCCTTCTCGATATGCGATTTCK^ 

^^^^^^CGT^^GAGTATTGTATTATGCAGCTAGTCACGGTACCCGATAGTAA 

^AA^TGmOTCAAGAGAGGGAAAGTAGGAGA 

GAGGCTCGAGGAATGGGAG 

T^^GAG^AGATAGCAGGGAATGAGTTTGAGCCATGGGAACGTGAGAAGAAGA 
???I^G^^CTCATAAG 

CTTGA^CGTTTGTTGCAAACT^ 

ScGCOTGATGGAGCTTGGATTGGATCCGCCCGATCTACCTATGGG^ 
ACTGATATTCCACT^ 

TCA^CAACAAAAGAGACAGGTCAGAAAGCTGAAGCAATGTG^ 

S^cgItg^ct^ 

TCAATGAAOTGCAGACCATGCGGCCTCTGCT^ 
CAGA^GATCTCGTTT^TAGGGGACATGCGAGGAGACACACTCGA 

gtctg^tIStacaaaaaactt^ 

TGAAGA^ACAAGATGGTTGTGAAGTATCTCGAGACTACTTATGAGCCTGTGAAA 
CTCTCTGA^G^^ 

ATCCAA^CC^CATrAGATGATATCAAGAAGTrACCAAATAAGG 

gtctgggtc^ 

gctgtatgctctcttccggttcctggttatatg™ 
aga^gcagctgcagaagcagcaaggtatggtttta 

AGGGTTTC^GTATTAGCCGTAGCATCACTTGGTC 

AGTCCACCAGAGGATACGAAGACGTTGGAAGATAAAAA 

A^AGGGAGGAAGAAAACTGAAGAGTCGG^^ 

AA^^GT^CCTTGTGGACGGTTGGTTCCATCGGAACATAAGGACAGTCCACTTGAG 
TACAACGAGTACGCGGTTrATGATCCGAAACAGACA^^ 

GAAGTGAAGTACGA.GGAGAAGGGAACTGAGATAGTCGATGTCGAACCAGAGTA 
G 

SEQ ID NO: 64, Deduced ammo acid sequence of the open reading frame ^00-4 

mkWtrshahmsgdeqkkgn^^ 
gdaaIe^^ 

WSPAERERGGTSKMVEAMEQGLPWSEAWLmSVEKHEAQPLE^ 
KGIPWDKODPSEEAffiSFSAELKMYGKRGVYMDTKLQERGGKIFEKDGLLYNCAFSI 
CDLGKGR^YCIMQLVTWDSNLNMYFKRGKVGDDPNAEERLEEW^ 
RLFEEIAGNEFEPWEREKKIQKKPEG'CFFPIDM^ 

SFVANPIKVLCGQEIYNYALMELGLDPPDLPMGMLTDIHLKRCEEVLLEFVEK^TT 
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KKIGVKGLGRKKTEESEHFMWRDDIKVPCGRLVPSEHKDSPLEYNEYAVYDPKQTSI 
RFLVEVKYEEKGTEIVDVEPE 

S£0 J© MO:65, Nucleotide sequence of the open r ^ d ^sMme<>fOO-5 
ATCTCTACCCCAGCTGAATCTTCAGACTCGAAATCGAAGAAAGATTTCA^TACTG 

CTA^CTCGAGAGGAAGAAGTCTCCGAACCGTCTCGTCGTCGATGAGGCTATCAA 

Sacaga^ 

tagatga?otagg^Wtga^ 
tagggaacttgttgaacttc 

GTTAAGCCACCGAAGGGAATTCTTCTTTATGGACCACCTG 

?GATCGC?CGTGCTG^CTAATGAAACGGGTGCOT 

A?CTGAGATCATOTCC^ 

aSSag^Iggctgagaaaaatgcgccttcaatc^ 

CTG^^Gi^CCATACAAAGAACATGAAGCTGGCTGAAGATGTGGATCTCGAAAGG 

Sctcaaagg^ 

AGGCC^CCCTGCAATGCATCAGGGAGAAGATGGATGTGATTGATCTGGAAGATG 
C^GAGCTOTGACAATCWrn 

ra^AAG^c^^^ 

AGAGTCTTGAACCAGCTTTTGACTGAGATGGACGGAATGAATGCCAAGAAAACC 
GTC^(^T^TCGGAGCTACCAACAGACCTGACATTATCGATTCAGCTCTTCTCC 

ctcSgga^^^ 

TCTCAATATC^^ 

Itgg^aot^^ 

TTTGCCAGAGAGCTTGCAAGTACGCCATCAGAGAAAACATTGAG^ 

AAAAGGAGAAGAGGAGGAGCGAGAACCCAGAGGCAATGGAGGAAGATGGAGT 

GG^GAAGTATCAGAGATCAAAGCTGCACACTTTGAGGAGTCGATGAACTAT 

gSotLgagtgtgagtgatgcagacatcaggaagta^ 

?gSgS^ 
gggacgatgatgatctctacaattag 
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SEQ ID NO:66, Deduced amino acid sequence of the open r ^ d ^Mme^OO-5 
MOTAESSDSKSKKDFSTAD^RKKSPNRLVVDEAINDDNSWSLHPAT^ 

DPAEYCWAPDTEIFCEGEPVKREDEEP^DDVGYDDVGGVRKQMAQIRELVELPLR 

^OLFraiGVKPPKGILLYGPPG 
NLRKAFEEAEKNAPSDFIDEIDSIAPKI^ 

MGATNKPNSIDPALRRFGRFDREIDIGVTDEIGRLEVLRIHTKNMK^ 

DTT3GYVG ADIJ^ALCTEAALQCIREKMDVIDLEDDSIDAEIINSMAVTNEHFHTALGN 

FYGPPGCGKTLLAKAIANECQANPISVKGPELLTMWFGESEANVREIFD^(^APC 
VLFFDELDSIATQRGGGSGGDGGGAADRVLNQLLTEMDGMNAKKT^ 

mSALLRPGPXD^ 

EICORACKY AIPnENIEKDIEKEKRRSENPE AMEED GVDE VSEIKAAHFEE SMKYARRS 
VSDADIRKYQAFAQTLQQSRGFGSEFRFENSAGSGATTGVADPFATSAAAAGDDDD 

LYN 

SEQ ID NO:67, Nucleotide sequence of the open reading frame ofOO-6 
ATCGACAAATCTAGTACCATGCTTGTTCACTATGACAAAGGGACTCCAGCAGTTG 

CTAATGAGATTAAAGAAGCTCTCGAAGGAAATGATGTTGAAGCTAAAGTTGATG 

CCATGAAGAAG^^AATTATGCTTTTGCTGAATGGTGAAACCATTCCTC 

CA^^ACCAT^ATAA^^TATGTGCTGCCTTCTGAAGACCACACCATCCAAAAGCTT 

CTGTTGCTGTACCTGGAGCTGATTGAAAAGACAGATTCGAAGGGGAAGGTGTTG 

CCTGAAATGATTTTGATATGCCAGAATCTTCGTAATAACCTTCAGCATCCGAATG 

AGTACATCCGTGGAGTGACACTGAGGTTTCTCTGTCGGATGAAGGAGACTGAAA 

TAGTGGAACCTTTGACTCCATCAGTGTTACAAAATCTGGAGCATCGCCATCCATT 

TGTTCGCAGGAATGCAATTCTGGCAATCATGTCGATATATAAACTTCCACA'TGGC 

GACCAACTCTTCGTGGATGCACCTGAAATGATCGAGAAAGTTCTATCAACAGAA 

CAAGATCCTTCTGCCAAGAGAAATGCATTTCTAATGCTCTTTACCTGTGCCGAAG 

AACGTGCAGTGAATTATCTTCTGAGCAATGTTGACAA 

ATCACTTCAGATGGTGGTGCTGGAGCTGATTCGAAGTGTGTGTAAGACTAAACCA 

GCGGAGA^GGGAAAATATATTAAAATTATTAm^ 

CTGCAGTTATCTATGAATGTC^^ 

GCmTrCGAGCTGCTGCCAACACCTACTGCCAACTTCTTCTTTCTCAGAGTGACA 

ACAATGTGAAGCTTATCTTGCTCGATCGGTTGTATGAGCTTAAGACATTGCACAG 

AGATATCATGGTTGAGCTGATAATCGATGTGCTCAGAGCACTCTCAAGCCCAAAC 

CTTGATATCCGCAGGAAGACACTTGACATTGCCCTTGACTTGATTACCCATCATA 

ATATTAATGAAGTCGTTCAAATGTTGAAGAAAGAAGTTGTGAAGACACAGAGTG 

GAGAACTTGAGAAGAATGGAGAGTACAGGCAAATGCTTATTCAAGCCATCCATG 

CTTGTGCAGTTAAGTTCCCCGAAGTTGCAAGCACAGTGGTCCATCTTCTGATGGA 

TTTCCTGGGAGATAGCAACGTGGCTTCAGCTCTTGACGTGGTTGTTTTCGTTAGA 

ga^aW^^ 

GACACGTTCTATCAGATCCGTGCAGGAAAGGTCTGCCCTTGTGCACTTTGGATCA 
TTGGTGAGTATTGCCTATCACTTTCAGAAGTTGAGAGTGGCATITCAACTATTAC 

ACAATGCCTTGGCGAATTACCATTTTACTCTGTTTCTC 

GAGACATCAAAGAAGATTCAGCCTACCTCTTCTGCCATGGTGTCCTCTAGAAAGC 

CAGTTATTCTTGCTGATGGAACTTATGCTACACAAAGCGCAGCCTCTGAAACCAC 

ATrCTCCTCGCCTACAGTTGTTCAAGGATCACTGACrrCTGGAAATTTGA^GCA 

CTCCTTCTAACTGGTGATTTTTTCCTCGGAGCTGTGGTTGCTTGCACGTTCACCA 

ACTTGTTCTTAGGTTGGAAGAGGTTCAGTCTTCCAAAACTGAAGTAAACAAGACA 
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GTATCACAGGCTTTGCTAATCATGGTTTCTATTTTGCAACTTGGGCA 
^CTCCACACCCTATTGATAATGATTCGTATGAGCGGATTATGTTGTGCATAAAA 

^GCTTTGCCATAGGi^TGTTGAGATGAAAAAGATATGG 

AGAGTTTTGTCAAGATGATTTCTGAAAAACAGCTTAGAGAGATGGAGGAACTGA 
AGGCAAAG^G^CAAACAACTC^ 

TCTAAAGAGTCGGAAGGGAATGAGTCAACTTGAGTTGGAAGACCAGGTACAAGA 

TGACCTAAAGCGTGCAACTGGAGAATTCACCAAGGACGAGAACGATGCTAACAA 

AOTAACCGCATTCTTCAACTCACAGGATTCAGTGACCCAGTCTATGCTGAAGCA 

TATG^CGGTACACCATTATGATATTGCTCTTGAAGTTACAGTAATCAACCGAA 

CCAAGGAAACCCTrCAGAACTTGTGCTTGGAGTTAGCAACCATGGGTGATCTCAA 

ACrrGTTGAGCGTCCTCAGAACTATAGTCTGGCACCTGAAAGAAGCATGCAGATT 

AAAGCAAACATCAAGGTCTCGTCCACAGAGACAGGAGTCATATTCGGGAACATC 

ACATTGATATCATGGACTATATCTCCCCTGCTGTGTGCTCAGAGGTTGCmCAGA 
ACTATGTGGGCAGAGTTTGAATGGGAAAACAAGGTTGCTGTGAACACCACAATT 

CAAAACG^^ 

CTCACTGCTCCATCTGCAATAGCAGGTGAATGTGGATTCCTTGCAGCAAACTTAT 
ATGCAAAAAGTGTATTTGGTGAGGATGCTCTTGTGAATTTGAGTATTGAGAAGCA 
AACGGATGGAACATTGAGTGGTTACATAAGGATAAGGAGCAAGACGCAAGGGA 
TTGCTCTAAGTCTTGGAGACAAAATCACCCTCAAACAAAAGGGTGGTAGCTGA 

SEO ID NO:68, Deduced amino acid sequence of the open reading frame °fOO-6 
MDKSSTMLVHYDKGTPAVANEIKEAI.EGNDVEAKVDAMKKAIMLLLNGETIPQLF1 

TIIRYVLPSEDHTIQKLLLLYLELffi™ 

LMLCRMKETEIVEPLTPSVLQNLEHPOHPFVRRNAILAIMSIYKLPHGDQLFVDAPEMI 

EKVLSTEQDPSAKRNAFLMLFTCAEERAVNYLLSNVDKVSDWNESLQMVVLELIRS 

VCKTKPAEKGKYIKmSLLSATSSAVIYECAGTLVSLSSAPTAIRAAAmYCQLLLSQS 

DNNVKLILLDRLYELKTLHRDIMV^ 
VVQMLKKEVVKTQSGELEKNGEYRQMLIQAIH^ 

NVASALDVVWVpllffiTOPKLRVSnTRLLDTFTQIRAGKVCPCALWnGEYCLSLSEy 

VSQAIXIMVSILQLGQSPVSPHProNDSYERIMLCIKLLCHRNVEMKKIWLE 

OflSEKQLREMEELKAKTQTTHAQPDDLroFFHLKSRKGMSQLELEDQVO^^ 

TCEFTKDENDANKLNPOLQLTGFSDPWAEAWTVHHYD^ 
CLELATMGDLKLVERPQNYSLAPERSMQIKAMKVSSTETGVIFGNIWETSNWEP^ 

VWLNDIHmiMDYISPAVCSEVAFRTMWAEFEWENKVAVNTTIQNEREFLDHIIKST 

NMKCLTAPSAIAGECGFLAANLYAKSVFGEDALVNLSIEKQTDGTLSGYIRIRSKTQG 

IALSLGDKITLKQKGGS 



SEQIDNO:69, Nucleotide sequence of the open reading frame ofOO-8 

ATGGCGAAATCTCAGATCTGGTTTGGTTTTGCGTTACTCGCGTTGCTTCT 

AGCCGTAGCTGACGATGTGGTTGTTTTGACTGACGATAGCTTCGAAAAGGAAGTT 

GGTAAAGATAAAGGAGCTCTCGTCGAGTTTTACGCTCCCTGGTGTGGTCACTGCA 

AGAAACTTGCTCCAGAGTATGAAAAGCTAGGGGCAAGCTTCAAGAAGGCTAAGT 

CTGTGTTGATTGCAAAGGTTGATTGTGATGAGCAAAAGAGTGTCTGTACTAAATA 

TGGTGTTAGTGGATACCCAACCATTCAGTGGTTTCCTAAAGGATCTCTTGAACCT 

CAAAAGTATGAGGGTCCACGCAATGCTGAAGC^ 

GAAGGAGGCACCAACGTAAAATTAGCTGCAGTTCCACAAAACG^^^G 
ACACCTGACAATTTCGATGAGATTGTTCTGGATCAAAACAAAGATGTCCTAGTCG 
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AATTTTATGCACCATGGTGTGGCCACTGCAAATCACTCGCTCCCACATACGAAAA 

GGTAGCCACAGTGTTTAAACAGGAAGAAGGTGTAGTCATCGCCAATTTGGATGC 

TGATGCACACAAAGCCCTTGGCGAGAAATATGGAGTGAGTGGATTCCCAACATT 

GAAATTCTTCCCAAAGGACAACAAAGCTGGTCACGATTATGACGGTGGCAGGGA 

TTTAGATGACTTTGTAAGCTTCATCAACGAGAAATCTGGGACCAGCAGGGACAGT 

AAAGGGCAGCTTACTTCAAAGGCTGGTATAGTCGAAAGCTTAGATGCTTTGGTAA 

AAGAGTTAGTTGCAGCTAGTGAAGATGAGAAGAAGGCAGTGTTGTCTCGCATAG 

AAGAGGAAGCAAGTACCCTTAAGGGCTCCACCACGAGGTATGGAAAGCTTTAC^T 

TGAAACTCGCAAAGAGCTACATAGAAAAAGGTTCAGACTATGCTAGCAAAGAAA 

CGGAGAGGCTTGK3ACGGGTGCTTGGGAAGTCGATAAGTCCAGTGAAAGCTGATG 

AACTCACTCTCAAGAGAAATATCCTAACCACGTTCGTTGCTTCTTCTTAA 

SEQ ID NO: 70, Deduced amino acid sequence of the open reading frame °f°°-S„ T ^ Tr ^ 
MAKSQIWFGFALLALLLVSAVADDVVVLTDDSFEKEVGKDKGALVEFYAPWCGHC 

KKLAPEYEKLGASFKKAKSVLIAKVDCDEQKSVCTKYGVSGYPTIQWFPKGSLEPQK 

YEGPRNAEALAEYVNKEGGTNVKEAAVPQNVWLTPDNFDEWLDQNKDVLVEFY 

APWCGHCKSLAPTYEKVATVFKQEEGWIANLDADAHKALGEKYGVSGFPTLKFFP 

KDNKAGHDYDGGRDLDDFVSFINEKSGTSRDSKGQLTSKAGIVESLDALVKELVAA 

SEDEKKAVLSRIEEEASTLKGSTTRYGKLYLiaiAKSYIEKGSDYASKETERLGRVLGK 

SISPVKADELTLKRNILTTFVASS 

SEQ ID NO: 71, Nucleotide sequence of the open reading frame ofOO-9 

ATGGCGTCGAGCGATGAGCGTCCAGGAGCGTATCCGGCACGTGACGGATCAGAG 

AACTTACCTCCGGGAGATCCAAAGACGATGAAGACGGTGGTGATGGATAAAGGA 

GCGGGGATGATGCAATCGTTGAAACCGATCAAACAGATGAGTCTCCATTTGTGTT 

CTTTCGCTTGTTATGGTCACGATCCTAGCCGTCAGATTGAAGTCAACTTCTATGTT 

CATCGACTCAACCAAGACTTTCTTCAATGTGCTGTTTACGATTGCGACTCCTCTAA. 

ACCCCATCTCATCGGGATCGAGTATATTGTGTCGGAGAGGTTATTTGAGAGTCTT 

GATCCGGAGGAGCAAAAGCTTTGGCACTCTCATGACTATGAGATCCAAACAGGC 

CTTCTAGTAACTCCAAGGGTCCCTGAGCTTGTAGCTAAGACAGAGCTTGAAAATA 

TTGCCAAAACTTATGGGAAGTTTTGGTGCACTTGGCAGACCGATCGCGGGGATAA 

ATTGCCACTTGGTGCACCATCACTTATGATGTCACCACAAGACGTGAATATGGGA. 

AAGATCAAGCCAGGGCTATTGAAGAAACGTGACGATGAGTATGGAATCTCGACG 

GAATCTITGAAGACGTCTCGAGTTGGAATTATGGGACCGGAGAAGAAAAATTCG 

ATGGCTGATTATTGGGTTCATCACGGAAAAGGATTAGCGGTTGACATAATCGAA 

ACTGAGATGCAGAAATTGGCTCCGTTCCCGTAA 

SEQ ID NO: 72, Deduced amino acid sequence of the open reading frame ofOO-9 
MASSDERPGAYPARDGSENLPPGDPKTMKTVVMDKGAAMMQSLKPIKQMSLHLCS 

FACYGHDPSRQmVOTYVHRLNQDFLQCAVYDCDSSKPHLIGIEYIVSERLFESLDPEE 

OKLWHSHDYEIQTGLLVTPRWELVAKTELENIAKTYGKFWCTWQTDRGDKLPLGA 

PSLMMSPQDVNMGKIKPGLLKKRDDEYGISTESLKTSRVGIMGPEKKNSMADYWVH 

HGKGLAVDHETEMQKLAPFP 

SEO ID NO: 73, Nucleotide sequence of the open reading frame of OO-10 „ A „ ^ A 

ATCGCGACTCTTAAGGTTTCTGATTCTGTTCCTGCTCCTTCTGATGATGCTGAGCA 

ATTGAGAACCGCTTTTGAAGGATGGGGTACGAACGAGGACTTGATCATATCAAT 

CTTGGCTCACAGAAGTGCTGAACAGAGGAAAGTCATCAGGCAAGCATACCACGA 

AACCTACGGCGAAGACCTTCTCAAGACTCTTGACAAGGAGCTCTCTAACGATTrc 

GAGAGAGCTATCTTGTTGTGGACTCTTGAACCCGGTGAGCGTGATGCTTTATTGG 
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CTAATGAAGCTACAAAAAGATGGACTTCAAGCAACCAAGTTCTTATGGAAGTTG 
CTOCA^GGACATCAACGCAGCTGCTTCACGCTAGGCAAGCTTACCATGCTCG 

C^CAAGAAGTCTCTTGAAGAGGACGTTGCTCACCACACTACCGGTC^ 

AAGCITITGGTTTCTCTTGTTACCTCATACAGGTACGAAGGAGATGAAGTGAACA 

TGACATTGGCTAAGCAAGAAGCTAAGCTGGTCCATGAGAAAATCAAGGACAAGC 

ACTACAATGATGAGGATGTTATTAGAATCTTGTCCACAAGAAGCAAAGCTCAGA 

TCAATGCTACTTTTAACCGTTACCAAGATGATCATGGCGAGGAAATTCTCAAGAG 

TOTGAGGAAGGAGATGATGATGACAAGTC^ 
CAGTGCITGACAAGACCAGAGCTITACTTTGTCGATGTC 

ACAAAACTGGAACTGATGAAGGAGCACTCACTAGAATTGTGACCACAAGAGCTG 
AGATTGACTTGAAGGTCATTGGAGAGGAGTACCAGCGCAGGAACAGCATTCCTT 
TGGAGAAAGCTATTACCAAAGACACTCGTGGAGATTACGAGAAGATGCTCGTCG 

CACTTCTCGGTGAAGATGATGCTTAA 

SEO ID NO:74, Deduced amino acid sequence of the open reading frame °f O °- 10 k ^ rr ^ rnr 
MATLKVSDSVPAPSDDAEQLRTAFEGWGTNEDLnSILAHRSAEQRKVIRQAYHETY 

GEDLLKTLDKELSNDFERA1LLWTLEPGERDALLANEATKRWTSSNQVLMEVACTR 

TSTOLLHARQAYHARYKKSLEEDVAHHTTGDFRKLLVSLVTSYRYEGDEVNMTLA 

KOEAKLVHEKIKDKHYlSlDEDVIPJLSTRSK^ 

DDKFLALLRSTIQCLTPO>ELWVDVLRSAINKTGTDEGALTmVTTP^IDLKVIGEEY 
QRRNSIPLEKAITKDTRGDYEKMLVALLGEDDA 

SEO ID NO:75, Nucleotide sequence of the open reading frame ofOO-11 
ATGGTGGATCTATTGAACTCGGTGATGAACCTGGTGGCGCCTCCAGCGACCATGG 

TGGTGATGGCCTTTGCATGGCCATTACTGTCTTTCATTAGCTTCTCCGAACGGGCT 

TACAACTCTTATTTCGCCACCGAAAATATGGAAGATAAAGTAGTTGTCATCACCG 

GAGCTTCATCGGCCATTGGAGAGCAAATAGCATATGAATATGCAAAAAGAGGAG 

CGAATTTGGTGTTGGTGGCGAGGAGAGAGCAGAGACTGAGAGTTGTGAGTAATA 

AGGCTAAACAGATTGGAGCCAACCATGTGATCATCATCGCTGCTGATGTCATCAA 

AGAAGATGACTGCCGCCGTTTTATCACCCAAGCCGTCAACTATTACGGCC^ 

GATCATCTAGTGAATACAGCGAGTCTTGGACACACTTTTTACTTTGAC^AAGTCA 

GTGACACGACTGTGTTTCCACATTTGCTGGACATAAACTTCTGGGGGAATG^ 

TCCGACATACGTAGCGTTGCCATACCTTCACCAGACGAATGGCCGAATAGTCGTG 

AATGCATCGGTTGAAAACTGGTTGCCTCTACCACGGATGAGTCTTTATTCTGCTG 

CAAAAGCAGCATTAGTCAACTTCTATGAGACGCTGCGTTTCGAGCTAAATGGAG 

ACGTTGGTATAACTATCGCGACTCACGGGTGGATTGGCAGTGAGATGAGTGGAG 

GAAAGTTCATGCTAGAAGAAGGTGCTGAGATGCAATGGAAGGAAGAGAGAGAA. 

GTACCTGCAAATGGTGGACCGCTAGAGGAATTTGCAAAGATGATTGTGGCAGGA 

GCTTGTAGGGGAGATGCATATGTGAAGTTTCCAAACTGGTACGATGTCmCTCC 

TCTATCGAGTCTTCACACCGAATGTACTGAGATGGACATTCAAGTTGTTACTGTC 

TACTGAGGGTACACGTAGAAGCTCCCTTGTTGGGGTCGGGTCAGGTATGCCTGTG 

GATGAATCCTCTTCACAAATGAAACTTATGCTTGAAGGAGGACCACCTCGAGTTC 

CTGCAAGCCCACCTAGGTATACCGCAA.GCCCACCTCATTATACCGCAAGJTCCACC 

ACGGTATCCTGCAAGCCCACCTCGGTATCCTGCGAGCCCACCTCGGTTTTCACAG 

TTTAATATCCAAGAGTTGTAA 

SEO JD NO 76 Deduced amino acid sequence of the open reading frame ofOO-11 
M^LLNSVM^VAPPATMVVMAFAWPLLSFISFSERAYNSYEATEmiEDK\^TG 

ASSAIGEQIAYEYAKRGANLVLVARREQRLRWSNKAKQIGA^ 

RRFITQAVNYYGRVDHLVNTASLGHTFYFEEVS 
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PYLHOTNGRIVWASVENWLPLPRMSLYSAAKAALVNFYETLRFELNGDVGITIATH 
GWIGSEMSGGKFMLEEGAEMQWKEEREVPANGGPLEEFAKMIVAGACRGDAYVKF 
PNWYDVFLLYRVFTPNVLRWTFKLLLSTEGTRRSSLVGVGSGMPVDESSSQMKLML 
EGGPPRVPASPPRYTASPPHYTASPPRYPASPPRYPASPPRFSQFNIQEL 

SEO ID NO:77, Nucleotide sequence of the open reading frame of 00-12 „„„„ 

ATGGCTGGAAAACTCATGCACGCTCTTCAGTACAACTCTTACGGTGGTGGCGCCG 

CCGGATTAGAGCATGTTCAAGTTCCGGTTCCAACACCAAAGAGTAATGAGGTTTG 

CCTGAAATTAGAAGCTACTAGTCTAAACCCTGTTGATTGGAAAATTCAGAAAGG 

AATGATCCGCCCATTTCTGCCCCGCAAGTTCCCCTGCATTCCAGCTACTGATGTTG 

CTGGAGAGGTCGTTGAGGTTGGATCAGGAGTAAAAAATTTTAAGGCTGGTGACA 

AAGTTGTAGCGGTTCTTAGCCATCTAGGTGGAGGTGGACTTGCTGAGTTCGCTGT 

TGCAACCGAGAAGCTGACTGTCAAAAGACCTCAAGAAGTGGGAGCAGCTGAAGC 

AGCAGCTTTACCTGTGGCGGGTCTAACCGCTCTCCAAGCTCTTACTAATCCTGCG 

GGGTTGAAGCTGGATGGTACAGGCAAGAAGGCGAACATCCTGGTCACAGCAGCA 

TCTGGTGGGGTTGGTCACTATGCAGTCCAGCTGGCAAAACTTGCAAATGCTCACG 

TAACCGCTACATGTGGTGCCCGGAACATAGAGTTTGTCAAATCGTTGGGAGCGG 

ATGAGGTTCTCGACTACAAGACTCCCGAGGGAGCCGCCCTCAAGAGTCCGTCGG 

GTAAAAAATATGACGCTGTGGTCCATTGTGCAAACGGGATTCCATTTTCGGTATT 

CGAACCAAATTTGTCGGAAAACGGGAAGGTGATAGACATCACACCGGGGCCTAA 

TGCAATGTGGACTTATGCGGTTAAGAAAATAACCATGTCAAAGAAGCAGTTAGT 

GCCACTCTTGTTGATCCCAAAAGCTGAGAATTTGGAGTTTATGGTGAATCTAGTG 

AAAGAAGGGAAAGTGAAGACAGTGATTGACTCAAAGCATCCTTTGAGCAAAGCG 

GAGGATGCTTGGGCCAAAAGTATCGATGGTCATGCTACTGGGAAGATCATTGTC 

GAGCCATAA 

SEO ID NO:78, Deduced amino acid sequence of the open reading frame ofOO-12 
MAGKLMHALQYNSYGGGAAGLEHVQVPVPTPKSNEVCLKLEATSLNPVDWKIQKG 

MIRPFLPRKFPCIPATDVAGEVVEVGSGVKNFKAGDKVVAVLSHLGGGGLAEFAVA 

TEKLTVK^QEVGAAEAAALPVAGLTALQALTNPAGLKLDGTGKKANILVTAASGG 

VGHYAVQLAKLANAHVTATCGARNIEFVKSLGADEVLDYKTPEGAALKSPSGKKY 

DAVVHCANGIPFSWEPNLSENGKVIDITPGPNAMWTYAVKKITMSKKQLVPLLLIPK 

AENLEFMVNLVKEGKVKTVIDSKHPLSKAEDAWAKSIDGHATGKirVEP 

SEQ ID NO:79, Nucleotide sequence of the open reading frame ofpp82 
ATGGAAATTCCCTTAGGTCGAGATGGCGAGGGTATGCAGTCAAAGCAGTGCCCG 

CGCGGCCACTGGCGTCCAGCGGAAGACGACAAGCTGCGAGAACTAGTGTCCCAG 

TTTGGACCTCAAAACTGGAATCTCATAGCAGAGAAACTTCAGGGTCGATCAGGG 

AAAAGCTGCAGGCTACGGTGGTTCAATCAGCTGGACCCTCGCATCAACCGGCAC 

CCATTCTCGGAAGAAGAGGAAGAGCGGCTGCTTATAGCACACAAGCGCTACGGC 

AACAAGTGGGCATTGATCGCGCGCCTCTTTCCGGGCCGCACAGACAACGCGGTG 

AAGAATCACTGGCACGTTGTGACGGCAAGACAGTCCCGTGAACGGACACGAACT 

TACGGCCGTATCAAAGGTCCGGTACATCGAAGAGGCAAGGGTAACCGTATCAAT 

ACCTCCGCACTTGGAAATTACCATCACGATTCGAAGGGAGCTCTCACAGCCTGGA 

TTGAGTCGAAGTATGCGACAGTCGAGCAGTCTGCGGAAGGGCTCGCTAGGTCTC 

CTTGTACCGGCAGAGGCTCTCCTCCTCTACCCACCGGTTTCAGTATACCGCAGAT 

TTCCGGCGGCGCCTTCCATCGACCGACAAACATGAGTACTAGTCCTCTTAGCGAT 

GTGACTATCGAGTCGCCAAAGTTTAGCAACTCCGAAAATGCGCAAATAATAACC 

GCGCCCGTCCTGCAAAAGCCAATGGGAGATCCCAGGTCAGTATGCTTGCCGAATT 

CGACTGTTTCCGACAAGCAGCAAGTGCTGCAGAGTAATTCCATCGACGGTCAGAT 
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CTCCTCCGGGCTCCAGACAA^^ 

CATTTCAATGAATCATCAAGCACCGGATATGTCCTGTGTTGGATTGAAGTCAAAT 

TTTCAGGGGAGTCTCCATCCTGGCGCTGTTAGATCTTCTTGGAATC 
CCACTGTTTTGGCCACAGTAACAAGTTGGTGGAGGAGTGCAGGAGTTCTACAGG 

CGCATGCACTCAACGCTCTGAGATTCTGCAAGA 

AAATGCAGCACTGCGTACAATACTGGAAGATATCAACATGAAAACCTTTGTGGG 
CCAGC^TTCTCGCAACAAGACACA 

CATTCTCCGGCCTAGTGAAGCATCGCCAAGAGAGGTTGTGCAAAGATAGTGGAT 
CTGCTCTCAAGCTGGGACTATCATGGGTTACAT^ 

TGTOCCAAAATGTCAGCATCGCAGCCAGAGCAGTCTGCGCCGGTTGCATTCATT 
GATTTTCTAGGCGTGGGAGCGGCCTGA 

SEO ID NO.80, Deduced amino acid sequence of the open reading frame ofpp82 

MEIPLGRDGEGMQSKQCPRGHWRPAEDDKLRELVSQFGPQNWNLIAEKLQGRSGKS 

CRLRWFNOLDPRINRHPFSEEEEERLLIAHKRYGNKWALIARLFPGR 
HWTAROSRERTRTYGRIKGPVHRRGKGNRINTSALGNYHHDSKGALTAWIESKYA 

TVEQSAEGLARSPCTGRGSPPLPTGFSIPQISGGAFHRPTNMSTSPLSDVTIESPKF^NS 
F^AOnTAPVLQKPMGDPRSVCLPNSTVSDKQQVLQSNSIDGQISSGLQTSAWAHDE 
^GVISMNHQAPDMSCVGLKSNFQGSLHPGAVRSSWNQSLPHCFGHSNKLVEECRS 
STGACTERSEILQEQHSSLQFKCSTAYNTGRYQHENLCGPAFSQQDTAmyAOTSlT. 
AFSGLVKHRQERLCKDSGSALKLGLSWVTSDSTLDLSVAKMSASQPEQSAPVAFIDF 

LGVGAA 

SEO ID NO:8I, Nucleotide sequence of the open reading frame ofPk225 

ATGGAGATGAACATTAAGTTTCCAGTTATAGACTTGTCTAAGCTCA^GGTGAAG 

AGAGAGACCAAACCATGGCTTTGATCGACGATGCTTGTCAAAACTGGGGOTCTT 

CGAGCTGGTGAACCATGGACTACCATATGATCTAATGGACAACATTGAGAGGAT 

GACAAAGGAACACTACAAGAAACATATGGAACAAAAGTTCAAAGAAATGCTTCG 

TTCCAAAGGTTTAGATACCCTCGAGACCGAAGTTGAAGATGTCGATTGGGAAAG 

CACTITCTACCTCCATCATCTCCCTCAAT^^ 

CAAATGAATACCGATTGGCAATGAAGGATTTTGGGAAGAGGCTTGAGATTCTAG 

CTGAAGAGCTATTGGACTTGTTGTGTGAGAATCTAGGGTTGGAGAAAGGGTACTT 

GAAGAAGGTGTTTCATGGGACAACGGGTCCAACTTTTGCGACAAAGCTTAGCAA 

CTATCCACCATGTCCTAAACCAGAGATGATCAAAGGGCTTAGGGCTCACACAGA 

TGGAGGAGGCCTCATTTTGCTGTTTCAAGATGATAAGGTCAGTGGTCTCCAGCrT 

CTTAAAGATGGTGATTGGGTTGATGTTCCTCCTCTCAAGCATTCCATTGTCATCAA 

CCTTGGTGACCAACTTGAGGTGATAACAAACGGGAAGTACAAGAGTGTAATGCA 

CCGTGTGATGACCCAGAAAGAAGGAAACAGGATGTCTATCGCGTCGTTTTACAA 

CCCCGGAAGCGATGCTGAGATCTCTCCGGCAACATCTCTTGTGGATAAAGACTCA 

AAATACCCAAGCTTTGTGTTTGATGACTACATGAAACTCTATGCCGGACTCAAGT 

TTCAGGCCAAGGAGCCACGGTTCGAGGCGATGAAAAATGCTGAAGCAGCTGCGG 

ATTTGAATCCGGTGGCTGTGGTTGAGACATTCTAA 

SEQ ID NO: 82, Deduced amino acid sequence of the open reading frame ofPk225 
MEMNKFPVIDI£KLNGEERDQTMALroDAC 

KEHYKKmiEQKEKEMLRSKGLDTLETEVEDVDWESTFYLHHLPQSNLYDIPDMSl^ 
YRIAMKDFGKRI^ILAEELLDLLCENLGI^KGYLKKWHGTTGPTFATKLSNYPPCP 
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CLAIMS 

WE CLAIM: 

1. An isolated LMP nucleic acid comprising a polynucleotide sequence encoding a 
polypeptide that functions as a modulator of a seed storage compound in a plant, wherein the 
polynucleotide sequence is selected from the group consisting of: 

a) a polynucleotide sequence as defined in SEQ ID NO:l, SEQ ID NO:3, SEQ 
ID NO:5, SEQ ID NO:7, SEQ ID NO:9, SEQ ID NO: 11, SEQ ID NO: 13, SEQ ID 
NO:15, SEQ ID NO:17, SEQ ID NO:19, SEQ ID NO:21, SEQ ID NO:23, SEQ ID 
NO:25, SEQ ID NO:27, SEQ ID NO:29, SEQ ID NO:31, SEQ ID NO:33, SEQ ID 
NO:35, SEQ ID NO:37, SEQ ID NO:39, SEQ ID NO:41, SEQ ID NO:43, SEQ ID 
NO:45, SEQ ID NO:47, SEQ ID NO:51, SEQ ID NO:53, SEQ ID NO:55, SEQ ID 
NO:57, SEQ ID NO:59, SEQ ID NO:61, SEQ ID NO:63, SEQ ID NO:65, SEQ ID 
NO:67, SEQ ID NO:69, SEQ ID NO:71, SEQ ID NO:73, SEQ ID NO:75, SEQ ID 
NO:77, SEQ ID NO:79, and SEQ ID NO: 81; and 

b) a polynucleotide sequence encoding a polypeptide as defined in: SEQ ID 
NO:2, SEQ ED NO:4, SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:12, 
SEQ ID NO:13, SEQ ID NO:16, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, 
SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ID NO:32, 
SEQ ID NO:34, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, 
SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID NO:52, SEQ ID NO:54, 
SEQ ID NO:56, SEQ ID NO:58, SEQ ID NO:60, SEQ ID NO:62, SEQ ID NO:64, 
SEQ ID NO:66, SEQ ID NO:68, SEQ ID NO:70, SEQ ID NO:72, SEQ ID NO:74, 
SEQ ID NO:76, SEQ ID NO:78, SEQ ID NO:80, and SEQ ID NO:82. 

2. The isolated LMP nucleic acid Claim 1 , wherein the polynucleotide sequence encodes 
a polypeptide sequence selected from the group consisting of SEQ ID NO:2, SEQ ID NO:4, 
SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:12, SEQ ID NO:13, SEQ ID 
NO: 16, SEQ ID NO: 18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, 
SEQ ID NO:28, SEQ ID NO:30, SEQ ID NO:32, SEQ ID NO:34, SEQ ID NO:36, SEQ ID 
NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, 
SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID NO:60, SEQ ID 
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NO:62, SEQ ID NO:64, SEQ ID NO:66, SEQ ID NO:68, SEQ ID NO:70, SEQ ID NO:72, 
SEQ ID NO:74, SEQ ID NO:76, SEQ ID NO:78, SEQ ID N0:80, and SEQ ID NO:82. 

3. The isolated IMP nucleic acid of Claim 1, wherein the polynucleotide sequence is 
selected from the group consisting of SEQ ID NO:l, SEQ ID NO:3, SEQ ID NO:5, SEQ ID 
NO:7, SEQIDNO:9, SEQ ID NO: 11, SEQIDNO:13, SEQIDNO:15, SEQIDNO:17, SEQ 
ID NO:19, SEQ ID NO:21, SEQ ID NO:23, SEQ ID NO:25, SEQ ID NO:27, SEQ ID 
NO:29, SEQ ID NO:31, SEQ ID NO:33, SEQ ID NO:35, SEQ ID NO:37, SEQ ID NO:39, 
SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ ID NO:51, SEQ ID 
NO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ ID NO:59, SEQ ID NO:61, SEQ ID NO:63, 
SEQ ID NO:65, SEQ ID NO:67, SEQ ID NO:69, SEQ ID NO:71, SEQ ID NO:73, SEQ ID 
NO:75, SEQ ID NO:77, SEQ ID NO:79, and SEQ ID NO:81. 

4. An isolated nucleic acid comprising a polynucleotide of least 60 consecutive 
nucleotides the LMP nucleic acid of Claim 1. 

5. An isolated nucleic acid comprising a polynucleotide having at least 70% sequence 
identity with the LMP nucleic acid of Claim 1 . 

6. An isolated nucleic acid comprising a polynucleotide having at least 90% sequence 
identity with the LMP nucleic acid of Claim 1. 

7. An isolated nucleic acid comprising a polynucleotide complementary to the LMP 
nucleic acid of Claim 1. 

8. An isolated nucleic acid that hybridizes under stringent conditions to the nucleic acid 
of Claim 1. 

9. An expression vector comprising an LMP nucleic acid of Claim 1 . 

10. The expression vector of Claim 9, wherein the LMP nucleic acid is operatively linked 
to a heterologous promoter selected from the group consisting of a seed-specific promoter, a 
root-specific promoter, and a non-tissue-specific promoter. 
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11. A method of producing a transgenic plant having a modified level of a seed storage 
compound comprising, transforming a plant cell with an expression vector comprising a lipid 
metabolism protein (IMP) nucleic acid and generating from the plant cell the transgenic 
plant, wherein the nucleic acid encodes a polypeptide that functions as a modulator of a seed 
storage compound in the plant, and wherein the nucleic acid comprises a polynucleotide 
sequence selected from the group consisting of: 

a) a polynucleotide sequence as defined in SEQ ID NO:l, SEQ ID NO:3, SEQ 
ID NO:5, SEQ ID NO:7, SEQ ID NO:9, SEQ ID NO:ll, SEQ ID NO:13, SEQ ID 
NO:15, SEQ ID NO:17, SEQ ID NO:19, SEQ ID NO:21, SEQ ID NO:23, SEQ ID 
NO:25, SEQ ID NO:27, SEQ ID NO:29, SEQ ID NO:31, SEQ ID NO:33, SEQ ID 
NO:35, SEQ ID NO:37, SEQ ID NO:39, SEQ ID NO:41, SEQ ID NO:43, SEQ ID 
NO:45, SEQ ID NO:47, SEQ ID NO:51, SEQ ID NO:53, SEQ ID NO:55, SEQ ID 
NO:57, SEQ ID NO:59, SEQ ID NO:61, SEQ ID NO:63, SEQ ID NO:65, SEQ ID 
NO:67, SEQ ID NO:69, SEQ ID NO:71, SEQ ID NO:73, SEQ ID NO:75, SEQ ID 
NO:77, SEQ ID NO:79, and SEQ ID NO:81; and 

b) a polynucleotide sequence encoding a polypeptide as defined in SEQ ID 
NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:12, 
SEQ ID NO:13, SEQ ID NO:16, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, 
SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ID NO:32, 
SEQ ID NO:34, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, 
SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID NO:52, SEQ ID NO:54, 
SEQ ID NO:56 s SEQ ID NO:58, SEQ ID NO:60, SEQ ID NO:62, SEQ ID NO:64, 
SEQ ID NO:66, SEQ ID NO:68, SEQ ID NO:70, SEQ ID NO:72, SEQ ID NO:74, 
SEQ ID NO:76, SEQ ID NO:78, SEQ ED NO:80, and SEQ ID NO:82. 

12. The method of Claim 11, wherein the LMP nucleic acid comprises a polynucleotide 
sequence selected from the group consisting of SEQ ID NO:l, SEQ ID NO:3, SEQ ID NO:5, 
SEQ ID NO:7, SEQ ID NO:9, SEQ ID NO:ll, SEQ ID NO:13, SEQ ID NO:15, SEQ ID 
NO:17, SEQ ID NO:19, SEQ ED NO:21, SEQ ID NO:23, SEQ ED NO:25, SEQ EO NO:27, 
SEQ EO NO:29, SEQ EO NO:31, SEQ EO NO:33, SEQ EO NO:35, SEQ ED NO:37, SEQ EO 
NO:39, SEQ EO NO:41, SEQ EO NO:43, SEQ EO NO:45, SEQ EO NO:47, SEQ EO NO:51, 
SEQ EO NO:53, SEQ EO NO:55, SEQ EO NO:57, SEQ EO NO:59, SEQ ED NO:61, SEQ EO 
NO:63, SEQ EO NO:65, SEQ EO NO:67, SEQ EO NO:69, SEQ EO NO:71, SEQ ID NO:73, 
SEQ EO NO:75, SEQ EO NO:77, SEQ ID NO:79, and SEQ ED NO:81. 
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13. The method of Claim 11, wherein the IMP nucleic acid comprises a polynucleotide 
sequence encoding a polypeptide selected from the group consisting of SEQ ID NO:2, SEQ 
ID NO:4, SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:12, SEQ ID NO:13, 
SEQ ID NO: 16, SEQ ID NO: 18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID 
NO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ID NO:32, SEQ ID NO:34, SEQ ID NO:36, 
SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID 
NO:48, SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:S6, SEQ ID NO:58, SEQ ID NO:60, 
SEQ ID NO:62, SEQ ID NO:64, SEQ ID NO:66, SEQ ID NO:68, SEQ ID NO:70, SEQ ID 
NO:72, SEQ ID NO:74, SEQ ID NO:76, SEQ ID NO:78, SEQ ID NO:80, and SEQ ID 
NO:82. 

14. The method of Claim 11, wherein the level of a seed storage compound is increased 
in the transgenic plant as compared to the wild type plant. 

15. The method of Claim 14, wherein the LMP nucleic acid encodes the polypeptide as 
defined in SEQ ID NO:28. 

16. The method of Claim 11, wherein the LMP nucleic acid is operatively linked to a 
heterologous promoter selected from the group consisting of a seed-specific promoter, a root- 
specific promoter, and a non-tissue-specific promoter. 

1 7. The method of Claim 1 1 , wherein the modified level of the seed storage compound is 
due to the overexpression or down-regulation of the LMP nucleic acid. 

18. A method of producing a transgenic plant having a modified level of a seed storage 
compound comprising, transforming a plant cell with an expression vector comprising an 
LMP nucleic acid, and generating from the plant cell the transgenic plant, wherein the LMP 
nucleic acid comprises a polynucleotide sequence that encodes a polypeptide that functions 
as a modulator of a seed storage compound in the plant, and wherein the LMP nucleic acid 
comprises a polynucleotide of least 60 consecutive nucleotides of the LMP nucleic acid of 
Claim 1. 
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19. A method of producing a transgenic plant having a modified level of a seed storage 
compound comprising, transforming a plant cell with an expression vector comprising an 
LMP nucleic acid and generating from the plant cell the transgenic plant, wherein the nucleic 
acid encodes a polypeptide mat functions as a modulator of a seed storage compound in the 
plant, and wherein the LMP nucleic acid comprises a polynucleotide having at least 70% 
sequence identity with the LMP nucleic acid of Claim 1. 

20. The method of Claim 19, wherein the LMP nucleic acid comprises a polynucleotide 
having at least 90% sequence identity with the LMP nucleic acid of Claim 1 . 

21 . A method of producing a transgenic plant having a modified level of a seed storage 
compound comprising, transforming a plant cell with an expression vector comprising a LMP 
nucleic acid and generating from the plant cell the transgenic plant, wherein the nucleic acid 
encodes a polypeptide mat functions as a modulator of a seed storage compound in the plant, 
and wherein the LMP nucleic acid comprises a first nucleic acid that hybridizes under 
stringent conditions to the nucleic acid of Claim 1 . 

22. A method of producing a transgenic plant having a modified level of a seed storage 
compound comprising, transforming a plant cell with an expression vector comprising a LMP 
nucleic acid and generating from the plant cell the transgenic plant, wherein the nucleic acid 
encodes a polypeptide that functions as a modulator of a seed storage compound in the plant, 
and wherein the LMP nucleic acid comprises a polynucleotide complementary to the LMP 
nucleic acid of Claim 1. 

23. The method of any one of Claims 11, 18, 19, 20, 21, or 22, wherein the modified level 
of the seed storage compound is due to the overexpression or down-regulation of the LMP 
nucleic acid. 

24. A method of modulating the level of a seed storage compound in a plant comprising, 
odifying the expression of an LMP nucleic acid in the plant, wherein the LMP nucleic acid 
selected from the group consisting of the LMP nucleic acids of Claims 1, 4, 5, 6, 7, or 8. 
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25. The method of any one of Claims 1 1, 18, 19, 20, 21, or 22,wherein the LMP nucleic 
acid encodes a polypeptide that contains a DNA-binding domain 



111 



WO 2004/013304 

m T/US2003/024364 

26. The method of Claim 25, wherein the IMP nucleic acid encodes a polypeptide 
selected from the group consisting of SEQ ID NO:2, SEQ ID NO:16, SEQ ID NO:28, SEQ 
ID NO:34, SEQ ID NO:64, SEQ ID NO:74, and SEQ ID NO:80. 

27. The method of Claims 11, 18, 19, 20, 21. or 22, wherein the nucleic acid encodes a 
polypeptide that contains a protein kinase domain 

28. The method of Claim 27, wherein the nucleic acid encodes a polypeptide selected 
from the group consisting of SEQ ID NO:20, SEQ ID NO:44, SEQ ED NO:46, and SEQ ID 
NO:62. 

29. The method of Claims 1 1, 18, 19, 20, 21, or 22, wherein the nucleic acid encodes a 
polypeptide mat contains a signal transduction domain. 

30. The method of Claim 29, wherein the nucleic acid encodes a polypeptide selected 
from the group consisting of SEQ ID NO:4, SEQ ID NO: 12, SEQ ID NO:42, SEQ ID NO:48, 
SEQ ID NO: 56, SEQ ID NO:68, and SEQ ID NO:72. 

31. The method of Claims 11, 18, 19, 20, 21, or 22, wherein the nucleic acid encodes a 
polypeptide that contains a protease domain. 

32. The method of Claim 31, wherein the nucleic acid encodes a polypeptide selected 
from the group consisting of SEQ ID NO:8, SEQ ID NO:38, SEQ ED NO:40, SEQ ID NO:52, 
and SEQ ED NO:66. 

33. The method of Claims 11, 18, 19, 20, 21, or 22, wherein the nucleic acid encodes a 
polypeptide that contains a lipid metabolism domain. 

34. The method of Claim 33, wherein the nucleic acid encodes a polypeptide selected 
from the group consisting of SEQ ED NO:6, SEQ ED NO:10, SEQ ID NO:14, SEQ ID NO:18, 
SEQ ED NO:22, SEQ ID NO:24, SEQ ED NO:26, and SEQ ED NO:30. 
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35. The method of claims 11, 18, 19, 20, 21, or 22, wherein the nucleic acid encodes a 
polypeptide that contains an oxidoreductase domain. 

36. The method of claim 35, wherein the nucleic acid encodes a polypeptide selected 
from the group consisting of SEQ ID NO:32, SEQ ID NO:36, SEQ ID NO:54, SEQ ID 
NO:58, SEQ ID NO:60, SEQ ID NO:70, SEQ ID NO:76, SEQ ID NO:78, and SEQ ID 
NO:82. 

37. A transgenic plant made by the method any one of the methods of claims 11, 18, 19, 
20, 21, or 22, wherein expression of the IMP nucleic acid in the plant results in a modified 
level of a seed storage compound in the plant as compared to a wild type variety of the plant. 

3 8. The transgenic plant of Claim 37, wherein the plant is a dicotyledonous plant. 

39. The transgenic plant of Claim 37, wherein the plant is a monocotyledonous plant. 

40. The transgenic plant of Claim 37, wherein the plant is an oil producing species. 

41. The transgenic plant of Claim 37, wherein the plant is selected from the group 
consisting of rapeseed, canola, linseed, soybean, sunflower, maize, oat, rye, barley, wheat, 
sugarbeet, tagetes, cotton, oil palm, coconut palm, flax, castor, and peanut 

42. The transgenic plant of Claim 37, wherein the level of the seed storage compound is 
increased in the transgenic plant as compared to the wild type variety of the plant 

43. The transgenic plant of Claim 42, wherein the LMP nucleic acid encodes the 
polypeptide as defined in SEQ ID NO:28. 

44. The transgenic plant of Claim 37, wherein the seed storage compound is selected 
from the group consisting of a lipid, a fatty acid, a starch, and a seed storage protein. 

45. A seed produced by the transgenic plant of Claim 37, wherein the plant expresses the 
LMP polypeptide and wherein the plant is true breeding for a modified level of the seed 
storage compound as compared to a wild type variety of the plant. 
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46. A seed oil produced by the seed of Claim 45. 
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